Lea Verou
Lea Verou

Reputation: 23907

How to find character at byte offset in PHP?

I'm trying to troubleshoot an issue with some (apparently) mangled serialized data in a MySQL database, after a conversion to UTF-8. When I try to unserialize them, I get the usual:

Notice: unserialize() [function.unserialize]: Error at offset 1481 of 255200 bytes [...]

However, given that this is a multi-byte string, I can't figure out how to find which character is at that byte offset. What I need is something like substr() but for bytes, instead of characters. How can I do that?

Thanks in advance.

Upvotes: 4

Views: 2013

Answers (2)

Gonzalo Larralde
Gonzalo Larralde

Reputation: 3541

You have to do a substr($str, 1481, 2);, substr($str, 1481, 3); or substr($str, 1481, 4);. If it's an UTF-8 you'll find it in any of thos 3 substrings, because an UTF-8 char may take from 2 to 4 chars, depending on the first char.

I've had a lot of problems with this, so if you can't find what's going on with the encoding, answer again :-) I'll try to lend you a hand.

Good luck!

Edit: Don't forget to do a header("Content-type: text/html;charset=utf8"); to watch the result properly.

Upvotes: 3

Gumbo
Gumbo

Reputation: 655775

substr does work on bytes instead of characters. So this should return the 1481st byte:

substr($data, 1481, 1)

Upvotes: 0

Related Questions