I Z
I Z

Reputation: 5927

bash scripting: how do I replace these lines matching a pattern with another line?

I need to process a set of XML files in a current directory using sed or a similar utility from within a bash script.

In each file that has either of the following lines (there might be 0 or 1 of them in a file)

    <MetaDatum key="Pr" value="VALUE (foobar)" />
    <MetaDatum key="Pr" value="VALUE (xyz12345678)" />

I need to replace that entire line with

    <MetaDatum key="Pr" value="VALUE" />

So what I need to do is essentially map VALUE (foobar) and VALUE (xyz12345678) into VALUE.

So what operation should I use inside this loop:

for f in `grep -l "MetaDatum key=\"Pr\" value=\"VALUE" *.xml`
do
     # replace one entire line in $f with '<MetaDatum key="Pr" value="VALUE" />'
done

Upvotes: 0

Views: 72

Answers (4)

John Bollinger
John Bollinger

Reputation: 180266

Supposing that the pattern in your grep command identifies all the lines that need to be modified and no other lines, you could write a sed command that matches the same lines, and substitutes the value of the value attribute on it:

sed '/MetaDatum key="Pr" value="VALUE/ s/value="[^"]*"/value="VALUE"/' $f

Note, however, that that approach (both grep and sed) is very sensitive to the exact details of your XML. It will fall over on different amounts of whitespace than you expect -- especially embedded newlines -- on extra attributes, on different choice of quotes, etc..

Some of those can be addressed by smarter patterns, but others not. To properly process XML, you want bona fide XML tools. In this case, an appropriate tool would be an XSLT transform. Here's a transform that would do the job (provided that the source file has not overridden the default XML namespace -- thanks, CharlesDuffy):

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <!-- identity transform: anything not otherwise matched is copied verbatim -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!--
    -- Transform the 'value' attribute of MetaDatum elements where the
    -- element has a 'key' attribute with value 'Pr', and the 'value'
    -- attribute's own value starts with 'VALUE'.
    -->
  <xsl:template match="MetaDatum[@key = 'Pr']/@value[substring(., 1, 5) = 'VALUE']">
    <xsl:attribute name="value">VALUE</xsl:attribute>
  </xsl:template>

</xsl:stylesheet>

You could apply it via any XSLT processor, but one of the more common of those is xsltproc, which comes with GNOME's libxslt. If the transform stored in file meta.xsl, then the commands to replace a file $f with the transformed output via xsltproc might be:

temp=`mktemp` && xsltproc meta.xsl "$f" > "$temp" && mv "$temp" "$f"

As @CharlesDuffy observed in comments, that may result in the file named by $f having different ownership and/or more restrictive permissions than it did before. How you might resolve that issue depends on the available tools. For example, although standard chown and chmod don't have it, the GNU versions have mechanisms for setting ownership and permissions of a file to match those of a different file. Additionally, you'll want to consider what the desired behavior is for the case when $f names a symbolic link (replace the link, or modify the file to which it points). Since these are environment- and prefence-dependent matters, if the command presented above does not handle them as you like then you'll need to decide how to revise the approach.

If you need to deal with an overridden default XML namespace, then the template will need to be a bit more complicated. You'll need to declare a namespace prefix for the namespace of the MetaDatum element and its attributes, and to use it wherever you refer to those names.

Upvotes: 1

Charles Duffy
Charles Duffy

Reputation: 295443

You can't reliably use sed for this job: XML can be written out in too many different ways. (For instance, your document could have the key and value attributes on different lines from the value they apply to, or could put "value" before "key", or could start using named namespaces and thus adding foo: prefixes on things). There's no guarantee that future versions of your input file will be generated with the exact same formatting, particularly as the code that generates it changes.

Instead, use an XML-aware tool such as XMLStarlet:

xmlstarlet ed \
  -u '//MetaDatum[@key="Pr"]/@value' \
  -v "VALUE" \
  <in.xml >out.xml

Note that if there's an xmlns="..." declaration at an enclosing scope in your file, this will change the expression above a bit. (This also means that your file format is using namespaces, so particularly likely to change in ways sed can't handle!)

For instance, if the top of your file starts with something like <root xmlns="http://example.com/foo">, then you'd need to do the following:

xmlstarlet ed \
  -N "foo=http://example.com/foo"
  -u '//foo:MetaDatum[@key="Pr"]/@value' \
  -v "VALUE" \
  <in.xml >out.xml

By the way -- if you'd rather perform edits in-place, xmlstarlet ed has a -i option to make changes inline; thus: xmlstarlet ed -i [...] changeme.xml will write out a modified version of changeme.xml, allowing find one-liners shown by some other answers to be leveraged.

Upvotes: 1

Giuseppe Ricupero
Giuseppe Ricupero

Reputation: 6272

Use this sed one-liner command to modify all the xml files (inside the current directory) in place:

sed -i 's,\(<MetaDatum\s*key="Pr"\s*value="VALUE\).*\s*/>,\1" />,' *.xml

You can also make a backup copy of the previous version adding something after the -i switch, such as a .bak suffix (or ~ if:

sed -i.bak 's,\(<MetaDatum\s*key="Pr"\s*value="VALUE\).*\s*/>,\1" />,' *.xml

Combine the command with find tool to apply sed to files with an .xml extension (case insensitive) that can be found in target directory or its subdirectories:

find ${targetDir} -type f -iname "*.xml" -exec sed -i 's,\(<MetaDatum\s*key="Pr"\s*value="VALUE\).*\s*/>,\1" />,' {} \;

Upvotes: 0

choroba
choroba

Reputation: 241898

Instead of sed, use a tool that properly parses the XML. For example, in xsh you can use

for $file in { glob '*.xml' } {
    open $file ;
    for //MetaDatum/@value
        set . xsh:subst(., 'VALUE \(.*', 'VALUE') ;
    save :b ;
}

Upvotes: 2

Related Questions