Reputation: 553
I'm using this function to convert a .csv
file to JSON
. ( The content of .csv
file is in Chinese. ) Then I write the JSON
string into files using file_put_contents('myfile.json',$JSON)
.
myfile.json
displays correctly when opened with Notepad and when echoed in Command Line, but when opened with Sublime Text, spews out horrible text like this:
[{"¶mÂí¥«°Ï":"«n¬ñ¶m","¥æ©ö¼Ðªº":"¤g¦a", ........ }]
Opening at Chrome give the same ugly text.
I copy the correct text from Notepad and paste it on Sublime Text. Sublime Text renders it correctly. I save the new file at Sublime Text and reopen it, and it renders correctly.
Question:
Why do different applications render the same text differently when I'm sure they're all " ready " to render UTF-8
text?
In Chrome, why is echo file_get_contents("ChineseText.txt")
giving horrible text while echo '張三'
giving the expected result?
I know I'm not giving a clear question statement. I'll respond your comment as soon as possible since this is the problem troubling me for a long time. Thanks in advance.
---Update---
Inspired by @KyawLay, I do a quick experiment.
I edited file_put_contents
into file_put_contents("myFile", utf8_encode($result);
. Then when opened in Notepad, it displays exactly the same ugly text as in Sublime Text and Chrome. I guess it's because the text has been encoded twice. Therefore, in the first place, Chrome and Sublime Text must have encoded the text in the background, causing the double-encoding problem and therefore rendering wrong. Is that correct?
Upvotes: 0
Views: 205
Reputation: 522382
file_put_contents
doesn't do any encoding conversion whatsoever, all it does is to dump raw bytes into raw files. Since you can see the contents as expected in one application, that means this is working. Since that function you're using doesn't do any encoding conversion itself, that means the result is in the same encoding as the original file, whatever that is.
The problem is simply that a plain text file doesn't declare its encoding anywhere. It's simply an accumulation of raw bytes. It's entirely up to the reading application to interpret those bytes in the right encoding. Notepad happens to do it correctly in this case, Sublime happens to guess wrong. If you opened the file explicitly telling Sublime what encoding it's in, it should do it just fine as well (not sure where that option is in Sublime exactly). The same thing goes for your browser; if you don't tell it via the Content-Type
HTTP header what encoding the content you sent it is in, it may guess wrong.
BTW, you should never cobble together JSON by hand as that function is doing, you should use json_encode
. For that you likely need to convert your CSV data from whatever encoding it's in to UTF-8 though, since json_encode
only works with UTF-8.
It works when you do echo '張三'
or copy and paste the content into Sublime, because then the content is saved in whatever encoding Sublime saves it as (likely UTF-8), which happens to be what your browser expects by default.
I'd recommend What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text as introduction to encodings.
Upvotes: 1