Reputation: 3746

Delete lines between last matching patterns

First of all, I am aware of these nice questions. My question is a bit different: given the text format below coming from a file1:

Pattern 1
some text to keep
nice text here
Pattern 1
another text to keep
Pattern 1
REMOVE ME
AND ME
ME TOO PLEASE
Pattern 2

How can I remove only text between last Pattern 1 and Pattern 2 including patterns, so that file1 now contains:

Pattern 1
some text to keep
nice text here
Pattern 1
another text to keep

I would prefer solution with sed, but any other solution (perl, bash, awk) would do just fine.

Upvotes: 1

Answers (4)

potong

Reputation: 58381

This might work for you (GNU sed):

sed '/Pattern 1/,${//{x;//p;x;h};//!H;$!d;x;s/.*Pattern 2[^\n]*\n\?//;/^$/d}' file

The general idea here is to gather up lines beginning with Pattern 1 and then either flush those lines when another line beginning with Pattern 1 is encountered or at end-of-file remove the lines between Pattern 1 and Pattern 2 and print what is left over.

Focus on the lines between the first line containing Pattern 1 and the end-of-file, print all other lines as normal. If a line contains Pattern 1, swap to the hold space and if those lines also contain the same regexp, print those lines and then replace the current line in the hold space. If the current line does not contain the regexp, then append it to the hold space and if it is not the end-of-file delete it. At the end-of-file, swap to the hold space and remove any lines upto and including the line containing Pattern 2 and print what is remaining.

N.B. a tricky situation arises as in your example, when the line containing Pattern 2 is the last line of the file. As sed uses newline to delimit lines, it removes them before placing the line into the pattern space and appends them prior to printing. If the pattern/hold space is empty, sed will append a newline, which in this case would add a spurious newline. The solution is to remove any lines between Pattern 1 and Pattern 2 including any newline following the line containing Pattern 2. If there are additional lines these will be printed as normal, however if there were no lines following, the hold space will now be empty and as it must have contained something before, since it is now empty it can safely be deleted.

Upvotes: 1

ghoti

Reputation: 46836

I can't think of a way to do this simply and elegantly in sed alone. It might be possible to do this with sed using write-only code, but I'd need a really good reason to write something like that. :-)

You still might be able to use sed for this in conjunction with other tools:

$ tac test.txt | sed '/^Pattern 2$/,/^Pattern 1$/d' | tac
Pattern 1
some text to keep
nice text here
Pattern 1
another text to keep

If your system doesn't have a tac on it, you can create one with:

$ alias tac="awk '{L[i++]=\$0} END {for(j=i-1;j>=0;)print L[j--]}'"

or in keeping with the theme:

$ alias tac='sed '\''1!G;h;$!d'\'

That said, I'd do this in awk, like so:

$ awk '/Pattern 1/{printf "%s",b;b=""} {b=b $0 ORS} /Pattern 2/{b=""} END{printf "%s",b}' text.txt
Pattern 1
some text to keep
nice text here
Pattern 1
another text to keep

Or split out for easier reading/commenting:

awk '
  /Pattern 1/ {          # If we find the start pattern,
    printf "%s",b        # print the buffer (or nothing if it's empty)
    b=""                 # and empty the buffer.
  }
  {                      # Add the current line to a buffer, with the
    b=b $0 ORS           # correct output record separator.
  }
  /Pattern 2/ {          # If we find our close pattern,
    b=""                 # just empty the buffer.
  }
  END {                  # And at the end of the file,
    printf "%s",b        # print the buffer if we have one.
  }' test.txt

This is roughly the same as hek2mgl's solution, but orders things a little more reasonably and uses ORS. :-)

Note that both of these solutions behave correctly only if Pattern 2 exists only once within the file. If you have multiple blocks, i.e. with both start and end patterns included, you'll need to work a little harder for this. If this is the case, please provide more detail in your question.

Upvotes: 2

hek2mgl

Reputation: 157947

With awk:

awk '
# On pattern 1 and when the buffer is not empty, flush the buffer
/Pattern 1/ && b!="" { printf "%s", b; b="" }

# Append the current line and a newline to the buffer
{ b=b""$0"\n" }

# Clean the buffer on pattern 2
/Pattern 2/ { b="" }' file

Upvotes: 1

choroba

Reputation: 241828

perl -ne 'if    (/Pattern 1/) { print splice @buff; push @buff, $_ }
          elsif (/Pattern 2/) { @buff = () }
          elsif (@buff)       { push @buff, $_ }
          else                { print }
' -- file

When you see Pattern 1, start pushing lines into a @buffer, output any lines accumulated so far. When you see Pattern 2, clear the buffer. If the buffer has been started, push any other line to it, otherwise print it (text before the first Pattern 1 or after Pattern 2.

Note: The behaviour of Pattern 2 without previous Pattern 1 was not specified.

Upvotes: 2

Delete lines between last matching patterns

Answers (4)

Related Questions