J.Doe
J.Doe

Reputation: 413

editing XML files using AWK

I have a xml file with some empty element fields ie

<elementA></elementA>

I am writing a bash script that fills up the empty field with a user specified value and saves it as a new xml file.

awk "{gsub("<elementA></elementA>", "$XMLVALUE", $0); print $0)" $EMTPYFILE > $NEWFILE
#$EMPTYFILE is a bash variable containing file path of xml file containing emtpy fields
#$NEWFILE is a bash variable file path of new xml file with redirected output from awk
#$XMLVALUE is a bash variable containing the value to be inserted into the field.

The output should be the original xml file but with filled with the value of $XMLVALUE.

However I am getting a variety of different errors, depending on if I am using single or double quotations. I think the problem is that there are multiple levels of parsing by awk and bash and I am mixing up proper handling of bash variables vs awk variables and the use of /.

Upvotes: 1

Views: 1064

Answers (1)

Charles Duffy
Charles Duffy

Reputation: 295650

awk is the wrong tool for this job.

  • It can't escape values to make them valid XML (changing Yellow & Blue to Yellow &Amp; Blue, or 3<4 to 3&lt;4) without you doing the work for every single value that needs to be so escaped.
  • It can't recognize comments, CDATA sections, or other XML syntax.
  • It can't guarantee that the output after performing edits will be valid, conforming XML.

Instead, use XMLStarlet:

xmlstarlet ed -u '//elementA' -v "$value" <in.xml >out.xml

That said, to safely pass shell variables' values to awk, use -v:

# Don't actually use this for XML!
awk -v in_string="$in_string" -v out_string="$out_string" \
  '{gsub(in_string, out_string); print}' \
  "$in_file" > "$out_file"

If you want awk to be dealing with literals, however, even this isn't good enough. See the gsub_literal function provided in BashFAQ #21.

Upvotes: 4

Related Questions