Reputation: 23
The situation:
I have a bunch of text files (.csv, to be precise), around 20000 that differ in character encoding: file -i *.csv
gives me charset=us-ascii
for most, but some are utf-16le
.
The goal:
I want them all to be encoded the same way, us-ascii here. I think of a one-liner that checks for each file in the directory the encoding, and if it is utf-16le
, it converts it to us-ascii
.
I only started to learn bash programming a few day ago, so this one still escapes me. Is it possible, something like running file -i on each file (did that), capturing the return value, check what encoding is given and if it is not us-ascii, convert it?
Thanks for helping me understand how to do that!
Upvotes: 0
Views: 4298
Reputation: 9281
This will convert any non-us-ascii encoded *.csv
files to us-ascii:
#!/bin/bash
for f in *.csv;do
charset=`file -i README.md |grep -o 'charset=.*'|cut -d= -f2`
if [ "$charset" != "us-ascii" ];then
echo "$f $charset -> us-ascii"
iconv -f "$charset" -t us-ascii < "$f" > "$f.tmp" \
&& mv "$f.tmp" "$f"
fi
done
Upvotes: 1
Reputation: 2235
The other solutions don't care about the mixture of files, which sounds like a solution in the sense of:
for F in *.csv; do
if [ `file -i "$F" | awk '{print $3;}'` = "charset=utf-16" ]; then
iconv -f UTF-16 -t US-ASCII "$F" > "u.$F"
fi
done
What makes it easier is the identity of us-ascii and utf-16 in the first few (128) characters - so if the file really is us-ascii, the conversion would not do any harm.
Upvotes: 2
Reputation: 5764
Pls try the following command:
iconv -f FROM-ENCODING -t TO-ENCODING *.csv
and replace FROM-ENCODING
and TO-ENCODING
with appropriate values.
You can use the following script, or something similar for your needs.
for file in *.csv
do
iconv -f FROM-ENCODING -t TO-ENCODING "$file" > "$file.new"
done
You can also use recode
command.
recode FROM-ENCODING..TO-ENCODING file.csv
Finally, look at this Best way to convert text files between character sets? if you are interested in learning more about iconv
and/or recode
Upvotes: 1