Reputation: 13
I have a text file containing some recurrent patterns, and I want to remove the lines between each matching pair of matching pattern.
Problem: The last occurrence of "pattern line" is the "opening pattern".
Example:
Some lines
In the preamble
START
Some lines # Remove this
I with to remove # Remove this
STOP # Remove this
Some lines
I wish to keep
START
Some other lines # Remove this
I with to remove # Remove this
STOP # Remove this
Some lines
I wish to keep
START
Don't remove this line
Etc.
So I want to remove everything between START
and STOP
, not the ones after the last occurrence of START
I found a number of solutions with sed and awk that might have worked for me if my original text had not a last occurrence of the opening pattern after the last closing one (Such as here), but alas this does not solve my problem.
Bonus: Ideally, I would like to delete the lines holding the closing pattern, but not the opening ones. This is not really important as I can always keep both and remove the closing ones afterwards.
I actually wish to clean the bookmarks of a huge pdf document built from the concatenation of several smaller documents that already contained several bookmarks each, to keep only the first bookmark from each original file. Any suggestions for alternatives to achieve this are also welcome.
Upvotes: 0
Views: 167
Reputation: 37464
$ awk '/START/,/STOP/{if($0=="START") a=""; else {a=a $0 ORS;next}} {print} END {printf "%s", a}' file
Some lines
In the preamble
START
Some lines
I wish to keep
START
Some lines
I wish to keep
START
Don't remove this line
Etc.
Walk-thru:
/START/,/STOP/ { # between markers
if($0=="START") # if START
a="" # reset a and print record in the end
else {
a=a $0 ORS # build up a
next # skip the print in the end
}
}
{
print # the print
}
END {
printf "%s", a # in the end print the a
}
Upvotes: 1