Reputation: 107
I have a XML file that looks something like this:
<Header version= '1.0' timestamp='2017-01-04T07:10:07'>
<Date>2017-04-18</Date>
.
.
.`
</Header>
<Header version= '1.0' timestamp='2017-01-04T07:10:07'>
<Date>2017-04-18</Date>
.
.
.`
</Header>
<Header version= '1.0' timestamp='2017-01-04T07:10:07'>
<Date>2017-04-18</Date>
.
.
.`
</Header>
I would like to delete the "Header" (and not /Header) lines starting with the 2nd occurrence - don't ask why :-). So the output should look something like this (yes, I know that it is not well formed, but I am going to perform other processing on it as well):
<Header version= '1.0' timestamp='2017-01-04T07:10:07'>
<Date>2017-04-18</Date>
.
.
.`
</Header>
<Date>2017-04-18</Date>
.
.
.`
</Header>
<Date>2017-04-18</Date>
.
.
.`
</Header>
I tried:
sed -i '2,${/<Header/d;}' file
but that deleted all the occurrences of Header. Any suggestions?
Thanks
Upvotes: 1
Views: 1051
Reputation: 26753
sed "/<Header/{p;:a;s/^.*$//;N;s/\n//;/<Header/!p;ba}" input.txt
This assumes that your header lines are always a single line. Otherwise it gets tough. In that case, think about whether this might be a XY problem (see comment by Cyrus). I also assume that removing the indentation of the date lines is not actually wanted.
Upvotes: 0
Reputation: 58473
This might work for you (GNU sed):
sed '/^<\/Header/,${/^<Header/d}' file
From the first closing Header
tag to the end of the file, remove any lines beginning with a Header
tag.
Upvotes: 2