Reputation: 15703
I have to convert a list of xml files in a folder from UTF-16 to UTF-8, remove the BOM, and then replace the keyword inside the file from UTF-16 to UTF-8.
I'm using cygwin to run a bash shell script to accomplish this, but I've never worked with SED before today and I need help!
I found a SED one liner for removing the BOM, now I need another for replacing the text from UTF-16 to UTF-8 in the xml header.
This is what I have so far:
#!/bin/bash
mkdir -p outUTF8
#Convert files to unix format.
find -exec dos2unix {} \;
#Use a for loop to convert all the xml files.
for f in `ls -1 *.xml`; do
sed -i -e '1s/^\xEF\xBB\xBF//' FILE
iconv -f utf-16 -t utf-8 $f > outUTF8/$f
sed 's/UTF-16/UTF-8/g' $f > outUTF8/$f
echo $f
done
However, this line:
sed 's/UTF-16/UTF-8/g' $f > outUTF8/$f
is hanging the script. Any ideas as to the proper format for this?
Upvotes: 0
Views: 2008
Reputation: 77085
Try something like this -
for filename in *.xml; do
sed -i".bak" -e '1s/^\xEF\xBB\xBF//' "$filename"
iconv -f utf-16 -t utf-8 "$filename" > outUTF8/"$filename"
sed -i 's/UTF-16/UTF-8/g' outUTF8/"$filename"
done
The first sed
will make a backup of your original files with an extension .bak
. Then it will use iconv
to convert the file and save it under a newly created directory with same filename. Lastly, you will make an in-file change with sed
to remove the text.
Upvotes: 2
Reputation: 37258
2 things
How big is your $f file, if it's really really big, it may just take a long to complete.
Opps, I see you have an echo $f
at the bottom of your loop. Move it before the sed command so you can see if there any spaces in the filenames.
2a:-). OR just change all references to $f
to "$f"
to protect against spaces.
I hope this helps.
Upvotes: 1