drake7788
drake7788

Reputation: 23

grep lines matching a pattern, and the lines before and after the matching until different pattern

Start_pattern
abc
d End_pattern
Start_pattern
abc
d
ef
ghij 
klm
no End_pattern
Start_pattern
abc
def
hij End_pattern
Start_pattern
abc
dhi
jklm End_pattern

Desired Output:

To print lines between Start_pattern including Search_pattern End_pattern Start and End pattern inclusive.

Start_pattern
abc
d
ef
ghij 
klm
no End_pattern
Start_pattern
abc
def
hij End_pattern

In above file i want to search for "ef" and print lines between "Strat_pattern" and "End_pattern".

  1. have tried grep -B[NUM] and -A[NUM] which are not useful as there could be unknown number of lines between search pattern "gef" and "Start_pattern" and "End_pattern".
  2. grep, sed , awk anything welcomed. preferentially one liner.
  3. sed -n '/BEGIN/,/END/p' * works to print lines between Search_pattern which is "def" and End_pattern. but iam not able to print lines between Start_pattern and "def"
  4. Multiple files present with multiple occurrences of search_pattern

Upvotes: 2

Views: 679

Answers (4)

Ed Morton
Ed Morton

Reputation: 203209

$ cat tst.awk
/Start_pattern/ { fnd=1; buf="" }
fnd {
    buf = buf $0 ORS
    if (/End_pattern/) {
        if (buf ~ /ef/) {
            printf "%s", buf
        }
        fnd = 0
        buf = ""
    }
}

$ awk -f tst.awk file
Start_pattern
abc
d
ef
ghij
klm
no End_pattern
Start_pattern
abc
def
hij End_pattern

Upvotes: 1

hek2mgl
hek2mgl

Reputation: 157947

With gawk, which supports multi char RS:

gawk 'BEGIN{RS=ORS="End_pattern"}/ef/' file

Output:

Start_pattern
abc
d
ef
ghij 
klm
no End_pattern
Start_pattern
abc
def
hij End_pattern

Explanation:

# Split records based on the End_pattern
BEGIN{RS=ORS="End_pattern"}

# Print records that contain the search term
/ef/

Btw, for cosmetic reasons you might want to append a newline at the end out the output:

gawk 'BEGIN{RS=ORS="End_pattern"}/ef/;END{printf "\n"}' file

PS: While the above solution works with gawk only, it is also possible to achieve that with a simple awk script which is compatible to POSIX, meaning it works with any awk:

awk '{b=b$0"\n"}/End_pattern/{if(b~/ef/){printf "%s",b};b=""}' file

Explanation:

# Append the current line plus a newline to b(uffer)
{b=b$0"\n"}

# Once End_pattern is found ...
/End_pattern/{
    # Check if the buffer contains the search term
    if(b~/ef/){
        # Print the buffer when the term was found
        printf "%s",b
    }
    # Clear the buffer
    b=""
}

awk '{b=b$0"\n"}/End_pattern/{if(b~/ef/){printf "%s",b};b=""}' file

Upvotes: 2

kvantour
kvantour

Reputation: 26471

Just for completeness I add the sed solution here :

sed -n '/Start_pattern/{:a;N;/End_Pattern/!ba;/ef/p}'

To understand this, you need to think of labels and branches as goto statements

  • If Start_pattern is found execute what is between {...}
  • Define a label a with :a
  • Add the line to the previous record. (N)
  • If End_Pattern is found do not goto label a (!ba)
  • After End_Pattern is found, execute the last part which states that if the full record contains ef, print the record.

Upvotes: 2

Sundeep
Sundeep

Reputation: 23667

adapting from my answer on another site - Get text between start pattern and end pattern based on pattern between start and end pattern

$ awk '/Start_pattern/{f=1; m=0; buf = $0; next}
       /ef/ && f{m=1}
       f{buf = buf ORS $0}
       /End_pattern/ && f{f=0; if(m==1)print buf}
      ' ip.txt
Start_pattern
abc
d
ef
ghij 
klm
no End_pattern
Start_pattern
abc
def
hij End_pattern
  • /Start_pattern/{f=1; m=0; buf = $0; next} set flag to indicate start of block, clear match, initialize buffer and move on to next line
  • /ef/ && f{m=1} if line contains ef, set match. f is used to avoid matching ef outside of Start_pattern...End_pattern
  • f{buf = buf ORS $0} as long as flag is set, accumulate input lines
  • /End_pattern/ && f{f=0; if(m==1)print buf} at end of block, print buffer if match was found

Upvotes: 1

Related Questions