How to find character at byte offset in PHP?

Question

I'm trying to troubleshoot an issue with some (apparently) mangled serialized data in a MySQL database, after a conversion to UTF-8. When I try to unserialize them, I get the usual:

Notice: unserialize() [function.unserialize]: Error at offset 1481 of 255200 bytes [...]

However, given that this is a multi-byte string, I can't figure out how to find which character is at that byte offset. What I need is something like substr() but for bytes, instead of characters. How can I do that?

Thanks in advance.

Gonzalo Larralde · Accepted Answer

You have to do a substr($str, 1481, 2);, substr($str, 1481, 3); or substr($str, 1481, 4);. If it's an UTF-8 you'll find it in any of thos 3 substrings, because an UTF-8 char may take from 2 to 4 chars, depending on the first char.

I've had a lot of problems with this, so if you can't find what's going on with the encoding, answer again :-) I'll try to lend you a hand.

Good luck!

Edit: Don't forget to do a header("Content-type: text/html;charset=utf8"); to watch the result properly.

How to find character at byte offset in PHP?

Answers (2)

Related Questions