Rizon
Rizon

Reputation: 1546

Why do PHP and Obj-C encode strings differently?

I'm trying to convert a string to UTF8, on both obj-c and php. I get different results:

"\xd7\x91\xd7\x93\xd7\x99\xd7\xa7\xd7\x94" //Obj-C
"\u05d1\u05d3\u05d9\u05e7\u05d4" //PHP

Obj-C code:

const char *cData = [@"בדיקה" cStringUsingEncoding:NSUTF8StringEncoding]

PHP code:

utf8_encode('בדיקה')

This difference breaks my hash algorithm that follows. How can I make the two strings encoded the same way? Should I change the obj-c\php ?

Upvotes: 2

Views: 127

Answers (2)

yonosoytu
yonosoytu

Reputation: 3319

  1. Go to http://www.utf8-chartable.de/unicode-utf8-table.pl
  2. In the combo box switch to “U+0590 … U+5FF Hebrew”
  3. Scroll down to “U+05D1” which is the rightmost character of your input string.
  4. The third column shows the two UTF-8 bytes: “d7 91”

If you keep looking you will see that the PHP and the Objective-C are actually the same. The “problem” you are seeing is that while PHP uses an Unicode escape (\u), Objective-C uses direct byte hexadecimal escapes (\x). Those are only visual representations of the strings, the bytes in memory are actually the same.

If your hash algorithm deals with bytes correctly, you should not see differences.

Upvotes: 2

Drew Shafer
Drew Shafer

Reputation: 4802

What are you using to do the encoding on PHP? It looks like you're generating a UTF-16 string.

Try utf8_encode() and see if that gives better results.

Upvotes: 1

Related Questions