LaDude
LaDude

Reputation: 1403

awk one liner: replace xml tags

I have got an xml file containing some attributes like

<string name="my/ attribute" optional="true">
  <description>some text</description>
  <value>some text again</value>
</string>

I would like to change the value (which does not necessarily have to be "some text again") by the string "none". I tried the following on the command line:

 awk '/<string name="my\/ attribute" optional="true">/,/<\/string>/ {sub(/<value>(.*)<\/value>/,"<value>none</value>")}1' my.xml > my_new.xml

This somehow works ok, but the result is as follows:

<string name="my/ attribute" optional="true">
  <description>some text</description>
  <value>some text again<\/value>
</string>

Why is the / (slash) in the tag escaped?

Thanks a lot for your help,

Daniela.

Upvotes: 0

Views: 2581

Answers (2)

Michael Kay
Michael Kay

Reputation: 163322

Don't use standard text tools to process XML - always use XML tools. Otherwise you (or your customers) will end up among the hundreds of people who post questions on this list asking what to do about the fact that they have ill-formed XML to process. It's simply too hard to get it right by hand, catering for all the edge cases that can arise. For example, do you know the rules for where whitespace is allowed within start and end tags? Judging from your sample code, you don't appear to.

Upvotes: 0

ghoti
ghoti

Reputation: 46856

Assuming the inconsistencies in your question that Richard pointed out are accidental:

$ cat input.xml
<string name="my/ attribute" optional="true">
  <description>some text</description>
  <value>some text again</value>
</string>

$ awk '/<string/{doit=1} doit{sub(/<value>[^<]+<\/value>/, "<value>none</value>"); print} /<\/string>/{doit=0}' input.xml 
<string name="my/ attribute" optional="true">
  <description>some text</description>
  <value>none</value>
</string>

$ 

This is WEE bit safer than your script, in that it will handle minified XML (i.e. whitespace removed, all on e line), but it won't handle <value> that is split over multiple lines.

I do recommend looking in to Perl's XML::Simple or PHP's SimpleXML. It won't be a one-liner, but it will work MUCH more reliably.

Upvotes: 1

Related Questions