Reputation: 2493
there's a XML file contains content like
<node1>
bla
<remove>
abc
</remove>
kkk
</node1>
I need to delete node under node1, but there's some node like <node9>
also contains <remove>
, which should not be deleted, I want to know to to do it, may be awk script or Python or whatever.
the output should be
<node1>
bla
abc
kkk
</node1>
Upvotes: 0
Views: 103
Reputation: 85775
Using the following input:
$ cat file
<node1>
bla
<remove>
abc
</remove>
kkk
</node1>
<node9>
bla
<remove>
abc
</remove>
kkk
</node9>
The following script will remove the required tag using GNU awk
:
$ awk '/<node1>/{gsub(/<[/]?remove>/," ")}
{printf "%s%s",$0,RT}' RS='</node[0-9]+>' file | grep '\S'
<node1>
bla
abc
kkk
</node1>
<node9>
bla
<remove>
abc
</remove>
kkk
</node9>
The script will even do the job if tags are not found on a single line:
$ cat file
<node1>bla<remove>abc</remove>kkk</node1>
<node9>bla<remove>abc</remove>kkk</node9>
$ awk '/<node1>/{gsub(/<[/]?remove>/," ")}
{printf "%s%s",$0,RT}' RS='</node[0-9]+>' file
<node1>bla abc kkk</node1>
<node9>bla<remove>abc</remove>kkk</node9>
Upvotes: 2
Reputation: 36252
I suggest a xml
parser. In python a good one is BeautifulSoup
:
from bs4 import BeautifulSoup
import sys
soup = BeautifulSoup(open(sys.argv[1], 'r'), 'xml')
for elem in soup.node1.children:
if elem.name == 'remove':
elem.decompose()
print(soup)
Upvotes: 1
Reputation: 195029
you should know that using text processing to to modify xml has risk. If you have to do it, this sed one-liner should work for your example and example in sudo's answer:
sed '/node1>/,/node1>/{/remove>/d}' file
Upvotes: 3