beerLantern
beerLantern

Reputation: 492

How can I programmatically change file encoding linux?

Is there any program to change file encoding to UTF-8 programmatically. I have like 1000 files and I want to save them in UTF-8 format in linux.

Thanks.

Upvotes: 2

Views: 1552

Answers (2)

mti2935
mti2935

Reputation: 12027

iconv is the tool for the job.

iconv -f original_charset -t utf-8 originalfile > newfile 

Upvotes: 2

Antoine
Antoine

Reputation: 14064

iconv will take care of that, use it like this:

iconv -f ISO88591 -t UTF8 in.txt out.txt

where 88591 is the encoding for latin1, one of the most common 8-bit encodings, which might (or not) be your input encoding.

If you don't know the input charset, you can detect it with the standard file command or the python based chardet. For instance:

iconv -f $(file -bi myfile.txt | sed -e 's/.*[ ]charset=//') -t UTF8 in.txt out.txt

You may want to do something more robust than this one liner, like don't process files when encoding is unknown.

From here, to iterate over multiple files, you can do something like

find . -iname *.txt -exec iconv -f ISO88591 -t UTF8 {} {} \;

I didn't check this, so you might want to google iconv and find, read about them here on SO, or simply read their man pages.

Upvotes: 5

Related Questions