Nejc Kikelj
Nejc Kikelj

Reputation: 35

Remove or replace double quotes from values of XML (specific search and replace)

I have the following XML:

<smtng attr="bla"><desc>bla 12" bla</desc></smtng>

And I would like to use some command (preferably executable from bash) to replace the " after 12 with &quot;, but leave it as it is for the attr="bla" part...

Any ideas?

Upvotes: 2

Views: 1260

Answers (3)

jeff
jeff

Reputation: 1

Using xmlstarlet you can do the following:

# cf. http://www.exslt.org/str/index.html
echo '<smtng attr="bla"><desc>bla 12" bla</desc></smtng>'  | 
xmlstarlet sel -T -t -m "//smtng/desc" -v "str:replace(.,'&quot;','&amp;quot;')" -n

Upvotes: 0

nhed
nhed

Reputation: 6001

#!/bin/bash

sed -e $'s@<desc>@\\\n<desc>@' -e  $'s@</desc>@</desc>\\\n@' | \
while IFS=$'\n\r' read line; do
  case "${line}" in
    *"<desc>"*)
    sed 's@"@\&quot;@' <<<"${line}"
    ;;

    *)
    echo "${line}"
    ;;
  esac
done

Out of laziness I edited my prior answer to isolate on its own line

Upvotes: 1

Dennis Williamson
Dennis Williamson

Reputation: 360143

This may work, but you should really use the proper tools.

sed 's|</\?desc>|\n&|g; s/\(<desc>[^"]*\)"\([^\n]*\n\)/\1\&quot;\2/g;s/\n//g' inputfile

Upvotes: 1

Related Questions