Reputation: 492
Is there any program to change file encoding to UTF-8 programmatically. I have like 1000 files and I want to save them in UTF-8 format in linux.
Thanks.
Upvotes: 2
Views: 1552
Reputation: 12027
iconv
is the tool for the job.
iconv -f original_charset -t utf-8 originalfile > newfile
Upvotes: 2
Reputation: 14064
iconv
will take care of that, use it like this:
iconv -f ISO88591 -t UTF8 in.txt out.txt
where 88591
is the encoding for latin1
, one of the most common 8-bit encodings, which might (or not) be your input encoding.
If you don't know the input charset, you can detect it with the standard file
command or the python based chardet
. For instance:
iconv -f $(file -bi myfile.txt | sed -e 's/.*[ ]charset=//') -t UTF8 in.txt out.txt
You may want to do something more robust than this one liner, like don't process files when encoding is unknown.
From here, to iterate over multiple files, you can do something like
find . -iname *.txt -exec iconv -f ISO88591 -t UTF8 {} {} \;
I didn't check this, so you might want to google iconv and find, read about them here on SO, or simply read their man pages.
Upvotes: 5