Reputation: 2495
I try to eject text from Word .DOC file with PHP. All seems ok, but the only trouble is something like
СУДОВА БУХГАЛТЕРІЯ
instead of russian text. I've tried to use html_entity_decode and utf8_encode, but they didn't help. Is there any simple solution?
Upvotes: 5
Views: 1530
Reputation: 655239
html_entity_decode
should work with the proper parameters (unless you’re using PHP 5.3.3 or later):
html_entity_decode($str, ENT_QUOTES, 'UTF-8')
This will convert the character references into UTF-8. Before PHP 5.3.3, the charset parameter’s default value was ISO-8859-1
. In that case the cyrillic characters can’t be converted as the ISO 8859-1 character set doesn’t contain them.
Upvotes: 4