Reputation: 2693
I have the following (simplified) file:
<RESULTS>
<ROW>
<COLUMN NAME="TITLE">title 1</COLUMN>
<COLUMN NAME="VERSION">1,3</COLUMN>
</ROW>
<ROW>
<COLUMN NAME="TITLE">title 1</COLUMN>
<COLUMN NAME="VERSION">1,1</COLUMN>
</ROW>
<ROW>
<COLUMN NAME="TITLE">title 1</COLUMN>
<COLUMN NAME="VERSION">1,2</COLUMN>
</ROW>
</RESULTS>
What I am trying to achieve is to delete all ROW elements that match on the title, but do not match on the latest VERSION (in this case 1,3). So, what I have in mind is something like the following with sed:
sed -i '/<ROW>/,/<\/ROW>/<COLUMN NAME=\"TITLE\">title 1.*<COLUMN NAME=\"VERSION\">^1,3<\/COLUMN>/d' file
The expected output should be the following:
<RESULTS>
<ROW>
<COLUMN NAME="TITLE">title 1</COLUMN>
<COLUMN NAME="VERSION">1,3</COLUMN>
</ROW>
</RESULTS>
Unfortunately, this did not work, neither did anything that I tried. I searched a lot for similar issues, but nothing worked for me. Is there a way of achieving it with any Linux command line utility (sed, awk, etc)?
Thanks a lot in advance.
Upvotes: 0
Views: 181
Reputation: 58420
This might work for you (GNU sed):
sed '/<ROW>/{:a;N;/<\/ROW>/!ba;/TITLE.*title 1/!b;/VERSION.*1,3/b;d}' file
Gather up lines between <ROW>
and </ROW>
.
If the lines collected don't contain the correct title, bail out.
If the lines collected do contain the correct version bail out.
Otherwise delete the lines collected.
Upvotes: 2
Reputation: 99094
/<ROW>/,/<\/ROW>/
won't work, because sed uses greedy matching; it matches everything from the first /<ROW>/
to the last /<\/ROW>/
.
You'll have to use one of the advanced features of sed. The simplest is probably the hold space.
This:
sed -n '/<ROW>/{h;d;};H;`
will store an entire ROW
block in the hold space, and overwrite it when it encounters a new ROW
block. (And print nothing.)
This:
sed -n '/<ROW>/{h;d;};H;/<\/ROW>/{g;p;}
will store the entire ROW
block, then print it out when it is complete.
This:
sed -n '/<ROW>/{h;d;};H;/<\/ROW>/{g;/title 1/!d;p;}'
will do the same, but will delete a block that does not contain "title 1".
This:
sed -n '/<ROW>/{h;d;};H;/<\/ROW>/{g;/title 1/!d;/1,3/p;}'
will do the same, but print only if the block contains "1,3". (You can spell out the matching lines more explicitly; I'm trying to keep this code concise.)
Upvotes: 2