cupidcb
cupidcb

Reputation: 97

convert UTF-8 to CP1252 in ubuntu with PHP or bash shell

I have a question about converting UTF-8 to CP1252 in Ubuntu with PHP or SHELL.

Background : Converting a csv file from UTF-8 to CP1252 in Ubuntu with PHP or SHELL, copy file from Ubuntu to Windows, open file with nodepad++.

Environment :

Methods used :

  1. With PHP
    iconv("UTF-8", "CP1252", "content of file")
    or
    mb_convert_encoding("content of file", "UTF-8", "CP1252")
    If I check the generated file with
    file -i name_of_the_file
    It displayed :
    name_of_the_file: text/plain; charset=iso-8859-1
    I copy this converted file to windows and opened with notepad++, in the bottom of the right, we can see the encoding is ANSI
    And when I changed the encoding from ANSI to Windows-1252, the specials characters were well displayed.

  2. With Shell
    iconv -f UTF-8 -t CP1252" "content of file"
    The rest will be the same .

Question : 1. Why the command file did not display directly CP1252 or ANSI but ISO-8895-1 ? 2. Why the specials characters could be well displayed when I changed the encoding from ANSI to Windows-1252.

Thank you in advance !

Upvotes: 0

Views: 3419

Answers (1)

Karol S
Karol S

Reputation: 9402

1.

CP1252 and ISO-8859-1 are very similar, quite often a file encoded in one of them would look identically as the file encoded in the second one. See Wikipedia to see which characters are in Windows-1252 and not in ISO-8859-1.

Letters à and ç are encoded identically in both encodings. While ISO-8859-1 doesn't have an œ and CP1252 does, file might have missed that. AFAIK it doesn't analyse the entire file.

2.

"ANSI" is a misnomer used for the default non-Unicode encoding in Windows. In case of Western European languages, ANSI means Windows-1252. In case of Central European, it's Windows-1250, in case of Russian it's Windows-1251, and so on. Nothing apart from Windows uses the term "ANSI" to refer to an encoding.

Upvotes: 0

Related Questions