Reputation: 2441
I have the following xml file. I want to edit it by removing the url
and title
attributes from every element <doc></doc>
. I am looking for a unix command that can help instead of writing a whole code.
<documents>
<doc id="852" url="http://en.wikipedia.org/wiki?curid=852" title="...">
<text>
Some text...
</text>
</doc>
<doc id="853" url="http://en.wikipedia.org/wiki?curid=853" title="...">
<text>
Some text...
</text>
</doc>
<doc id="854" url="http://en.wikipedia.org/wiki?curid=854" title="...">
<text>
some text...
</text>
</doc>
</documents>
Upvotes: 2
Views: 2242
Reputation: 10264
If the xML is as consistent as this, a simple example that could work is:
sed -r 's/^(<doc .* )url=".*/\1>/' myfile.xml
That says to identify lines that start with a <doc
tag, save the contents up to url
, discarding the rest of the line, and re-closing with a new >
.
You could get more careful with the regex, but sed is a good tool for this, IF the XML is totally predictable.
If you want to change the file in-place, add a -i
to the sed invocation.
Upvotes: 3