user311509
user311509

Reputation: 2866

Read UTF-8 Lines of Text and Write them to File

I fetch lines of UTF-8 text from a page then dump into a file. The text in the original page appears fine. However, the text in the output file appears scrambled!

My attempt:

$myFile = "testFile.txt";
$fh = fopen($myFile, 'w') or die("can't open file");
$pageContent = file_get_contents("page.html");
//Here: use regex to grab the title ...
$stringData = $title."\n";
fwrite($fh, utf8_encode($stringData));
fclose($fh);

Before writing anything to the file. I saved the file as UTF-8 and i also saved it as Unicode, i still get scrambled text as:

ÊãäíÇÊí ááÌãíÚ

I'm not using PHP5

Any help will be appreciated...

Upvotes: 1

Views: 4011

Answers (2)

deceze
deceze

Reputation: 522195

Don't use utf8_encode!

Sorry for the shouting, it's just misused way too often.
Your text is already in UTF-8.* You do not need to encode it to UTF-8 again.
utf8_encode converts Latin1 encoded text to UTF-8. Your text is not Latin1 encoded. That's why it screws up. Just read and write the text, done. No encoding conversion or re-encoding necessary.

* Assuming page.html is encoded in UTF-8. From what you're saying, it seems to be.

Upvotes: 7

borrible
borrible

Reputation: 17366

It looks like you are double encoding. If you read the utf8_encode documentation you'll see that it is designed to encode ISO-8859-1 strings into UTF-8. If you've already got a UTF-8 string you should not run this function on it; otherwise it will interpret it as ISO-8859-1 and do incorrect encoding.

Upvotes: 0

Related Questions