James-Jesse Drinkard
James-Jesse Drinkard

Reputation: 15703

How to search and replace text in an xml file with SED?

I have to convert a list of xml files in a folder from UTF-16 to UTF-8, remove the BOM, and then replace the keyword inside the file from UTF-16 to UTF-8.

I'm using cygwin to run a bash shell script to accomplish this, but I've never worked with SED before today and I need help!

I found a SED one liner for removing the BOM, now I need another for replacing the text from UTF-16 to UTF-8 in the xml header.

This is what I have so far:

  #!/bin/bash
mkdir -p outUTF8

#Convert files to unix format.
find -exec dos2unix {} \;

#Use a for loop to convert all the xml files.
for f in `ls -1 *.xml`; do
    sed -i -e '1s/^\xEF\xBB\xBF//' FILE
    iconv -f utf-16 -t utf-8 $f > outUTF8/$f
    sed 's/UTF-16/UTF-8/g' $f > outUTF8/$f
    echo $f
done

However, this line:

sed 's/UTF-16/UTF-8/g' $f > outUTF8/$f

is hanging the script. Any ideas as to the proper format for this?

Upvotes: 0

Views: 2008

Answers (2)

jaypal singh
jaypal singh

Reputation: 77085

Try something like this -

for filename in *.xml; do
    sed -i".bak" -e '1s/^\xEF\xBB\xBF//' "$filename"
    iconv -f utf-16 -t utf-8 "$filename" > outUTF8/"$filename"
    sed -i 's/UTF-16/UTF-8/g' outUTF8/"$filename"
done

The first sed will make a backup of your original files with an extension .bak. Then it will use iconv to convert the file and save it under a newly created directory with same filename. Lastly, you will make an in-file change with sed to remove the text.

Upvotes: 2

shellter
shellter

Reputation: 37258

2 things

  1. How big is your $f file, if it's really really big, it may just take a long to complete.

  2. Opps, I see you have an echo $f at the bottom of your loop. Move it before the sed command so you can see if there any spaces in the filenames.

2a:-). OR just change all references to $f to "$f" to protect against spaces.

I hope this helps.

Upvotes: 1

Related Questions