Reputation: 2866
I fetch lines of UTF-8 text from a page then dump into a file. The text in the original page appears fine. However, the text in the output file appears scrambled!
My attempt:
$myFile = "testFile.txt";
$fh = fopen($myFile, 'w') or die("can't open file");
$pageContent = file_get_contents("page.html");
//Here: use regex to grab the title ...
$stringData = $title."\n";
fwrite($fh, utf8_encode($stringData));
fclose($fh);
Before writing anything to the file. I saved the file as UTF-8 and i also saved it as Unicode, i still get scrambled text as:
ÊãäíÇÊí ááÌãíÚ
I'm not using PHP5
Any help will be appreciated...
Upvotes: 1
Views: 4011
Reputation: 522195
Don't use utf8_encode
!
Sorry for the shouting, it's just misused way too often.
Your text is already in UTF-8.* You do not need to encode it to UTF-8 again.
utf8_encode
converts Latin1 encoded text to UTF-8. Your text is not Latin1 encoded. That's why it screws up. Just read and write the text, done. No encoding conversion or re-encoding necessary.
* Assuming page.html
is encoded in UTF-8. From what you're saying, it seems to be.
Upvotes: 7
Reputation: 17366
It looks like you are double encoding. If you read the utf8_encode documentation you'll see that it is designed to encode ISO-8859-1 strings into UTF-8. If you've already got a UTF-8 string you should not run this function on it; otherwise it will interpret it as ISO-8859-1 and do incorrect encoding.
Upvotes: 0