user1332577
user1332577

Reputation: 362

delete n lines between 2 matching patterns, keeing the first match and deleting the second match

Given data in a text file:

string1 EP00 37.45 83.83 
save
save
save
gibberish
gibberish
gibberish
gibberish
gibberish
gibberish
gibberish
gibberish
gibberish
gibberish
gibberish
gibberish
gibberish
string2

gibberish
gibberish
gibberish
gibberish
gibberish
gibberish
gibberish
gibberish
gibberish
gibberish
gibberish
gibberish
gibberish

I would like to use sed or awk to match both string1 and string 2, then delete everything after string1 and the first 3 lines. I would like to it to also delete string2, but not string1. And also delete one extra line in between that and the next text. So the expected output would be:

string1 EP00 37.45 83.83 
save
save
save

There are always the same number of lines in between the two patterns if that helps (16). I would like to do this with sed or awk, but have only been able to figure out a script to delete the entire block of data between the two, holding onto both strings:

sed '/string1/,/string2/{//!d}' file >> tr.txt

Does anyone know how to specify to retain string1 and the three lines after it and delete the rest of the lines in between the two patterns including string2? I would like to do this with sed or awk, whichever is easier.

Thanks!

Upvotes: 3

Views: 892

Answers (5)

potong
potong

Reputation: 58478

This might work for you (GNU sed):

sed -rn '/string1/{h;d};H;/string2/{x;s/(string1([^\n]*\n){4}).*string2.*/\1/p}' file

Upvotes: 0

BMW
BMW

Reputation: 45293

Using GNU sed

sed -n '/^string1/,+3p' file

If no GNU sed, try this:

sed  -n ':a;/string1/{N;N;N;p;ta;}' file

Upvotes: 0

Floris
Floris

Reputation: 46415

If you want to do this with awk, the script might look something like this (updated based on your comments; it now "recycles", so it will do the matching correctly for as many times as you have the string1-string2 pattern. I realize you have already got an answer you accepted but wanted to give you this alternative; it is much less "professional" than @anubhava's answer, but it might give you an insight in how to make awk do "anything you want", even if you are not a pro):

BEGIN {
  state = 0;
  }
{ if($1 == "string1") {
     state = 1;
   }
   if (state == 1) {
      state = 2;
      print;
      next;
   }
   if (state > 1 && state < 5) {
       print;
       state = state + 1;
       next;
   }
   if ($1 == "string2") {
       state = 6;
       next;
   }
   if (state == 6) {
       state = 0;
       next;
   }
   if (state == 0) {
       print;
       next;
   }
}

The state variable basically tells you "where am I in the logic". The states are:

0: "normal state", print the line, go to the next
1: "found string2", start printing this line and the next three
2 - 4: printing "the lines that followed string1"
5: Waiting for string2, not printing anything
6: found string2, need to delete the next line
   Having found the next line, we reset the state to 0 again…

You would run it with

awk -f scriptFile.awk inputfile.txt > outputfile.txt

I made this "pedestrian", so you can see exactly what is done, and in what order. Let me know if you have any questions.

Upvotes: 2

anubhava
anubhava

Reputation: 785631

You can use this awk:

awk '/^string1/{i=0} /^string1/,/^string2/{i++; if (i<5) print; next}1' file
string1 EP00 37.45 83.83 
save
save
save

Upvotes: 5

twalberg
twalberg

Reputation: 62459

Something like this:

sed -e '1,/^string1/-1d' -e '/string1/+4,$/d' < file > output

The first command removed from line 1 up to the line preceeding a line starting with "string1", and the second finds the line starting with "string1", counts 4 lines after that, and deletes from there to the end.

You could also do this, if your version of grep supports it:

grep -A3 "^string1" file > output

Upvotes: 0

Related Questions