Reputation: 2493

how to remove some node for xml?

there's a XML file contains content like

    <node1>
      bla
      <remove>
        abc
      </remove>
        kkk
    </node1>

I need to delete node under node1, but there's some node like <node9> also contains <remove>, which should not be deleted, I want to know to to do it, may be awk script or Python or whatever.

the output should be

   <node1>
      bla
        abc
        kkk
    </node1>

Upvotes: 0

Answers (4)

Chris Seymour

Reputation: 85775

Using the following input:

$ cat file
<node1>
   bla
   <remove>
     abc
   </remove>
   kkk
</node1>
<node9>
   bla
   <remove>
     abc
   </remove>
   kkk
</node9>

The following script will remove the required tag using GNU awk:

$ awk '/<node1>/{gsub(/<[/]?remove>/," ")}
       {printf "%s%s",$0,RT}' RS='</node[0-9]+>' file | grep '\S'
<node1>
   bla
     abc
   kkk
</node1>
<node9>
   bla
   <remove>
     abc
   </remove>
   kkk
</node9>

The script will even do the job if tags are not found on a single line:

$ cat file
<node1>bla<remove>abc</remove>kkk</node1>
<node9>bla<remove>abc</remove>kkk</node9>

$ awk '/<node1>/{gsub(/<[/]?remove>/," ")}
       {printf "%s%s",$0,RT}' RS='</node[0-9]+>' file 
<node1>bla abc kkk</node1>
<node9>bla<remove>abc</remove>kkk</node9>

Upvotes: 2

Birei

Reputation: 36252

I suggest a xml parser. In python a good one is BeautifulSoup:

from bs4 import BeautifulSoup
import sys

soup = BeautifulSoup(open(sys.argv[1], 'r'), 'xml')

for elem in soup.node1.children:
    if elem.name == 'remove':
        elem.decompose()

print(soup)

Upvotes: 1

Jotne

Reputation: 41446

Another awk

awk '/node1>/,/\/node1>/ {if ($0~/remove>/) $0=""} NF'

Upvotes: 1

Kent

Reputation: 195029

you should know that using text processing to to modify xml has risk. If you have to do it, this sed one-liner should work for your example and example in sudo's answer:

sed '/node1>/,/node1>/{/remove>/d}' file

Upvotes: 3

how to remove some node for xml?

Answers (4)

Related Questions