Pawel Bala
Pawel Bala

Reputation: 93

excluding xml tag in sed - specific case

please help! I have spend hours searching for my resoultion, and I am hitting the wall with my head... All I want to do using sed is: Find tag, which contains "Number Deleted" string, and remove it

input:

    <Cell ss:StyleID="s128"/>
    <Cell ss:StyleID="s128"/>
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell ss:StyleID="s81"><Data ss:Type="String">Number Deleted</Data></Cell>
    <Cell ss:StyleID="s81"/>
    <Cell ss:StyleID="s81"/>
    <Cell ss:StyleID="s81"/>
    <Cell ss:StyleID="s82"><Data ss:Type="Boolean">0</Data></Cell>
    <Cell ss:StyleID="s81"/>
    <Cell ss:StyleID="s82"><Data ss:Type="Boolean">0</Data></Cell>
    <Cell ss:StyleID="s83"><Data ss:Type="String">-1</Data></Cell>
    <Cell ss:StyleID="s81"><Data ss:Type="String">&quot;Deleted:&quot;</Data></Cell>
    <Cell ss:StyleID="s81"/>
    <Cell ss:StyleID="s81"/>
    <Cell ss:StyleID="s81"/>
   </Row>
   <Row ss:AutoFitHeight="0">
    <Cell><Data ss:Type="String">Number Saved</Data></Cell>
    <Cell ss:Index="5"><Data ss:Type="Boolean">0</Data></Cell>
    <Cell ss:Index="7"><Data ss:Type="Boolean">0</Data></Cell>

output:

    <Cell ss:StyleID="s128"/>
    <Cell ss:StyleID="s128"/>
   </Row>

   <Row ss:AutoFitHeight="0">
    <Cell><Data ss:Type="String">Number Saved</Data></Cell>
    <Cell ss:Index="5"><Data ss:Type="Boolean">0</Data></Cell>
    <Cell ss:Index="7"><Data ss:Type="Boolean">0</Data></Cell>

so far I figured out, how to view xml exluding lines from "Number Deleted" till the end of the tag , but this does wrong for xml integrity, because tag is not closed, here is what I have:

function filter_xml
{
  START="<Cell ss:StyleID="s81"><Data ss:Type="String">Number Deleted"
  END="<\/Row>"
  sed "/$START/,/$END/d" file.xml
}

Upvotes: 0

Views: 320

Answers (3)

potong
potong

Reputation: 58463

This might work for you (GNU sed):

sed '/<Row /!b;:a;$bb;N;/.*\n[^\n]*<\/Row>/!ba;:b;/Number Deleted/d' file

Upvotes: 0

Julien Vivenot
Julien Vivenot

Reputation: 2250

I do not think sed is the best tool for dealing with XML files.

Couldn't you actually parse the XML file ?

Here is a some quick and dirty example with python :

In /tmp/data file:

<data xmlns:ss="foobar">
<Row>
<Cell ss:StyleID="s128"/>
<Cell ss:StyleID="s128"/>
</Row>
<Row ss:AutoFitHeight="0">
<Cell ss:StyleID="s81"><Data ss:Type="String">Number Deleted</Data></Cell>
<Cell ss:StyleID="s83"><Data ss:Type="String">-1</Data></Cell>
</Row>
<Row ss:AutoFitHeight="0">
<Cell><Data ss:Type="String">Number Saved</Data></Cell>
<Cell ss:Index="5"><Data ss:Type="Boolean">0</Data></Cell>
</Row>
</data>

Python code :

import xml.dom.minidom as Xml
file = "/tmp/data"
xmlDoc = Xml.parse(file)
for row in xmlDoc.getElementsByTagName("Row"):
  if "Number Deleted" not in row.toprettyxml():
    print row.toxml()

Output:

<Row>
<Cell ss:StyleID="s128"/>
<Cell ss:StyleID="s128"/>
</Row>
<Row ss:AutoFitHeight="0">
<Cell><Data ss:Type="String">Number Saved</Data></Cell>
<Cell ss:Index="5"><Data ss:Type="Boolean">0</Data></Cell>
</Row>

Upvotes: 1

choroba
choroba

Reputation: 241948

Use an XML-aware tool. For example, xsh:

open file.xml ;
remove //Row[Cell/Data/text()='Number Deleted'] ;
save :b ;

Upvotes: 1

Related Questions