Reputation: 5927
I need to process a set of XML files in a current directory using sed
or a similar utility from within a bash
script.
In each file that has either of the following lines (there might be 0 or 1 of them in a file)
<MetaDatum key="Pr" value="VALUE (foobar)" />
<MetaDatum key="Pr" value="VALUE (xyz12345678)" />
I need to replace that entire line with
<MetaDatum key="Pr" value="VALUE" />
So what I need to do is essentially map VALUE (foobar)
and VALUE (xyz12345678)
into VALUE
.
So what operation should I use inside this loop:
for f in `grep -l "MetaDatum key=\"Pr\" value=\"VALUE" *.xml`
do
# replace one entire line in $f with '<MetaDatum key="Pr" value="VALUE" />'
done
Upvotes: 0
Views: 72
Reputation: 180266
Supposing that the pattern in your grep
command identifies all the lines that need to be modified and no other lines, you could write a sed
command that matches the same lines, and substitutes the value of the value
attribute on it:
sed '/MetaDatum key="Pr" value="VALUE/ s/value="[^"]*"/value="VALUE"/' $f
Note, however, that that approach (both grep
and sed
) is very sensitive to the exact details of your XML. It will fall over on different amounts of whitespace than you expect -- especially embedded newlines -- on extra attributes, on different choice of quotes, etc..
Some of those can be addressed by smarter patterns, but others not. To properly process XML, you want bona fide XML tools. In this case, an appropriate tool would be an XSLT transform. Here's a transform that would do the job (provided that the source file has not overridden the default XML namespace -- thanks, CharlesDuffy):
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- identity transform: anything not otherwise matched is copied verbatim -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!--
-- Transform the 'value' attribute of MetaDatum elements where the
-- element has a 'key' attribute with value 'Pr', and the 'value'
-- attribute's own value starts with 'VALUE'.
-->
<xsl:template match="MetaDatum[@key = 'Pr']/@value[substring(., 1, 5) = 'VALUE']">
<xsl:attribute name="value">VALUE</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
You could apply it via any XSLT processor, but one of the more common of those is xsltproc
, which comes with GNOME's libxslt. If the transform stored in file meta.xsl
, then the commands to replace a file $f
with the transformed output via xsltproc
might be:
temp=`mktemp` && xsltproc meta.xsl "$f" > "$temp" && mv "$temp" "$f"
As @CharlesDuffy observed in comments, that may result in the file named by $f
having different ownership and/or more restrictive permissions than it did before. How you might resolve that issue depends on the available tools. For example, although standard chown
and chmod
don't have it, the GNU versions have mechanisms for setting ownership and permissions of a file to match those of a different file. Additionally, you'll want to consider what the desired behavior is for the case when $f
names a symbolic link (replace the link, or modify the file to which it points). Since these are environment- and prefence-dependent matters, if the command presented above does not handle them as you like then you'll need to decide how to revise the approach.
If you need to deal with an overridden default XML namespace, then the template will need to be a bit more complicated. You'll need to declare a namespace prefix for the namespace of the MetaDatum
element and its attributes, and to use it wherever you refer to those names.
Upvotes: 1
Reputation: 295443
You can't reliably use sed
for this job: XML can be written out in too many different ways. (For instance, your document could have the key and value attributes on different lines from the value they apply to, or could put "value" before "key", or could start using named namespaces and thus adding foo:
prefixes on things). There's no guarantee that future versions of your input file will be generated with the exact same formatting, particularly as the code that generates it changes.
Instead, use an XML-aware tool such as XMLStarlet:
xmlstarlet ed \
-u '//MetaDatum[@key="Pr"]/@value' \
-v "VALUE" \
<in.xml >out.xml
Note that if there's an xmlns="..."
declaration at an enclosing scope in your file, this will change the expression above a bit. (This also means that your file format is using namespaces, so particularly likely to change in ways sed can't handle!)
For instance, if the top of your file starts with something like <root xmlns="http://example.com/foo">
, then you'd need to do the following:
xmlstarlet ed \
-N "foo=http://example.com/foo"
-u '//foo:MetaDatum[@key="Pr"]/@value' \
-v "VALUE" \
<in.xml >out.xml
By the way -- if you'd rather perform edits in-place, xmlstarlet ed
has a -i
option to make changes inline; thus: xmlstarlet ed -i [...] changeme.xml
will write out a modified version of changeme.xml
, allowing find
one-liners shown by some other answers to be leveraged.
Upvotes: 1
Reputation: 6272
Use this sed one-liner command to modify all the xml files (inside the current directory) in place:
sed -i 's,\(<MetaDatum\s*key="Pr"\s*value="VALUE\).*\s*/>,\1" />,' *.xml
You can also make a backup copy of the previous version adding something after the -i
switch, such as a .bak
suffix (or ~
if:
sed -i.bak 's,\(<MetaDatum\s*key="Pr"\s*value="VALUE\).*\s*/>,\1" />,' *.xml
Combine the command with find
tool to apply sed to files with an .xml extension (case insensitive) that can be found in target directory or its subdirectories:
find ${targetDir} -type f -iname "*.xml" -exec sed -i 's,\(<MetaDatum\s*key="Pr"\s*value="VALUE\).*\s*/>,\1" />,' {} \;
Upvotes: 0