Reputation: 97
I have a question about converting UTF-8 to CP1252 in Ubuntu with PHP or SHELL.
Background : Converting a csv file from UTF-8 to CP1252 in Ubuntu with PHP or SHELL, copy file from Ubuntu to Windows, open file with nodepad++.
Environment :
Methods used :
With PHP
iconv("UTF-8", "CP1252", "content of file")
or
mb_convert_encoding("content of file", "UTF-8", "CP1252")
If I check the generated file with
file -i name_of_the_file
It displayed :
name_of_the_file: text/plain; charset=iso-8859-1
I copy this converted file to windows and opened with notepad++, in the bottom of the right, we can see the encoding is ANSI
And when I changed the encoding from ANSI to Windows-1252, the specials characters were well displayed.
With Shell
iconv -f UTF-8 -t CP1252" "content of file"
The rest will be the same .
Question : 1. Why the command file did not display directly CP1252 or ANSI but ISO-8895-1 ? 2. Why the specials characters could be well displayed when I changed the encoding from ANSI to Windows-1252.
Thank you in advance !
Upvotes: 0
Views: 3419
Reputation: 9402
1.
CP1252 and ISO-8859-1 are very similar, quite often a file encoded in one of them would look identically as the file encoded in the second one. See Wikipedia to see which characters are in Windows-1252 and not in ISO-8859-1.
Letters à
and ç
are encoded identically in both encodings. While ISO-8859-1 doesn't have an œ
and CP1252 does, file
might have missed that. AFAIK it doesn't analyse the entire file.
2.
"ANSI" is a misnomer used for the default non-Unicode encoding in Windows. In case of Western European languages, ANSI means Windows-1252. In case of Central European, it's Windows-1250, in case of Russian it's Windows-1251, and so on. Nothing apart from Windows uses the term "ANSI" to refer to an encoding.
Upvotes: 0