Reputation: 4010
I''m trying to read some Ms Word Doc in PHP which is in Unicode, some stuff like Hebrew or Arabic. but it reads in binary and turns into some non scenes chars. I googled up some sample codes but non of those worked properly, do you have any experience in Unicode docs like Arabic and Hebrew? thanks
Upvotes: 0
Views: 576
Reputation: 1308
One of the drawbacks of PHP is that (at least until recently) has been Unicode-ignorant. You usually get along with this by simply ignoring the fact that what you are reading is Unicode and hoping that the web browser your document ends up in knows how to deal with Unicode. PHP doesn't destroy anything, it just doesn't care.
Depending on what you are trying to do, there are a few additions to PHP that allow improved Unicode handling. Among them are the mb_
string functions, which cope with multi-byte strings.
You also need to find out how the text is encoded in the Word document. Unicode supports many formats, the most popular and most compact one being UTF-8
, yet there are also UTF-16
and UTF-32
.
Upvotes: 1