How to search and replace text in an xml file with SED?

Question

I have to convert a list of xml files in a folder from UTF-16 to UTF-8, remove the BOM, and then replace the keyword inside the file from UTF-16 to UTF-8.

I'm using cygwin to run a bash shell script to accomplish this, but I've never worked with SED before today and I need help!

I found a SED one liner for removing the BOM, now I need another for replacing the text from UTF-16 to UTF-8 in the xml header.

This is what I have so far:

  #!/bin/bash
mkdir -p outUTF8

#Convert files to unix format.
find -exec dos2unix {} \;

#Use a for loop to convert all the xml files.
for f in `ls -1 *.xml`; do
    sed -i -e '1s/^\xEF\xBB\xBF//' FILE
    iconv -f utf-16 -t utf-8 $f > outUTF8/$f
    sed 's/UTF-16/UTF-8/g' $f > outUTF8/$f
    echo $f
done

However, this line:

sed 's/UTF-16/UTF-8/g' $f > outUTF8/$f

is hanging the script. Any ideas as to the proper format for this?

jaypal singh · Accepted Answer

Try something like this -

for filename in *.xml; do
    sed -i".bak" -e '1s/^\xEF\xBB\xBF//' "$filename"
    iconv -f utf-16 -t utf-8 "$filename" > outUTF8/"$filename"
    sed -i 's/UTF-16/UTF-8/g' outUTF8/"$filename"
done

The first sed will make a backup of your original files with an extension .bak. Then it will use iconv to convert the file and save it under a newly created directory with same filename. Lastly, you will make an in-file change with sed to remove the text.

How to search and replace text in an xml file with SED?

Answers (2)

Related Questions