Hani Goc
Hani Goc

Reputation: 2441

How to delete attributes in an xml file by using a linux command (example sed)

I have the following xml file. I want to edit it by removing the url and title attributes from every element <doc></doc>. I am looking for a unix command that can help instead of writing a whole code.


<documents>
<doc id="852" url="http://en.wikipedia.org/wiki?curid=852" title="...">
<text>
 Some text...
</text>
</doc>

<doc id="853" url="http://en.wikipedia.org/wiki?curid=853" title="...">
<text>
 Some text...
</text>
</doc>

<doc id="854" url="http://en.wikipedia.org/wiki?curid=854" title="...">
<text>
 some text...
</text>
</doc>

</documents>

Upvotes: 2

Views: 2242

Answers (1)

Micah Elliott
Micah Elliott

Reputation: 10264

If the xML is as consistent as this, a simple example that could work is:

sed -r 's/^(<doc .* )url=".*/\1>/' myfile.xml

That says to identify lines that start with a <doc tag, save the contents up to url, discarding the rest of the line, and re-closing with a new >.

You could get more careful with the regex, but sed is a good tool for this, IF the XML is totally predictable.

If you want to change the file in-place, add a -i to the sed invocation.

Upvotes: 3

Related Questions