ePezhman
ePezhman

Reputation: 4010

PHP reading "Unicode" MS Word doc

I''m trying to read some Ms Word Doc in PHP which is in Unicode, some stuff like Hebrew or Arabic. but it reads in binary and turns into some non scenes chars. I googled up some sample codes but non of those worked properly, do you have any experience in Unicode docs like Arabic and Hebrew? thanks

Upvotes: 0

Views: 576

Answers (1)

Fabian Schuiki
Fabian Schuiki

Reputation: 1308

One of the drawbacks of PHP is that (at least until recently) has been Unicode-ignorant. You usually get along with this by simply ignoring the fact that what you are reading is Unicode and hoping that the web browser your document ends up in knows how to deal with Unicode. PHP doesn't destroy anything, it just doesn't care.

Depending on what you are trying to do, there are a few additions to PHP that allow improved Unicode handling. Among them are the mb_ string functions, which cope with multi-byte strings.

You also need to find out how the text is encoded in the Word document. Unicode supports many formats, the most popular and most compact one being UTF-8, yet there are also UTF-16 and UTF-32.

Upvotes: 1

Related Questions