umpirsky
umpirsky

Reputation: 10024

PHP convert string from windows-1250 to utf-8

I'm crawling windows-1250 site (meta http-equiv="Content-Type" content="text/html; charset=windows-1250").

Since my database is utf-8, I need to convert data to utf-8.

For that job I tried iconv('windows-1250', 'UTF-8', $s); it gives "ÄŚarls" instead "Čarls".

It gives a bit better results when encodings switch places iconv('UTF-8', 'windows-1250', $s); gives "Èarls" instead "Čarls". Strange.

Do you have any idea how can I convert this to utf-8?

Thanks in advance.

Upvotes: 0

Views: 16891

Answers (2)

umpirsky
umpirsky

Reputation: 10024

Folks, I'm really sorry. It was the database problem. $connection->setCharset('UTF8'); fixed it. No iconv, no mbstring.

I was so sure that I need to convert charset, that I forgot to check if it works on uft8 page without conversion.

Thanks for all comments.

Upvotes: 2

borrible
borrible

Reputation: 17376

I'd recommend first verifying whether or not the correct data is reaching iconv (and similarly what is going out from iconv).

Use a statement like echo bin2hex($string) and look at the byte stream for $s before iconv. If you've got the string you believe you have the first byte should be c8. If you then look at the byte stream after iconv the first bytes should be c48c (in UTF-8); if you convert to UCS-2 you'd see 010c which you'll see is the relevant character in unicode.

Depending on the results of this you'll know if your problem lies with gathering the data (i.e. you did not see the c8), your iconv installation (i.e. the conversion produces the wrong result) or putting that data into your database (i.e. the result of iconv is as expected).

Upvotes: 0

Related Questions