Bartosz
Bartosz

Reputation: 53

How to select lines between two similar patterns

I have a text file that contains text blocks like this:

IN
hit
ER 123 hit 456
abc
hit
ghi
ER 789 hit 012
abc
ghi
IN 345 
abc
def
ghi
ER 678 xxx 901
xyz
hit
xyz
IN
risk
in

Blocks can have any number of lines, but always start with line containing ER or IN.

Using awk how can I select lines which are occurring between two similar marker patterns?

1) There may be multiple sections marked with these patterns.

2) One of selected lines between patterns must contain another pattern (eg. hit)

3) Line with first pattern (eg. ER) should be included, line with second one (eg. ER|IN) should be excluded.

Expected output:

ER 123 hit 456
abc
hit
ghi
ER 678 xxx 901
xyz
hit
xyz

I've tried to achieve my goal with

awk '/ER/ {block=1} block {str=str sep $0; sep=RS} /ER|IN/ {block=0; if (str~/hit/) {print str} str=sep=""}'

but it gives me

ER abc hit ghi
ER 789 hit 012

EDIT: my example wasn't precise enough. EDIT2:

a) I try to find line matching pattern " ER " b) I search for nearest next line matching pattern " ER " or " IN " c) I want to print result only if my result contains at least one line matching pattern ".hit.", but it can't be first line. Result should include first line, but exclude last line, so:

ER 678 xxx 901
xyz
hit
xyz

should be printed, because there is one line matching hit in block between line matching " ER " and line matching " IN "

ER 789 hit 012
abc
ghi

shouldn't be printed, because there is no line matching hit in block between line matching " ER " and line matching " IN "

Upvotes: 3

Views: 140

Answers (3)

RavinderSingh13
RavinderSingh13

Reputation: 133518

Could you please try following and let me know if this helps you.

awk '
/ER/ && val{
  if(hit_flag){
    print val};
  val=hit_flag=token=in_er_token=""
}
/ER/ && !val{
  val=$0;
  token=1
  next
}
val && token && (/[Hh][Ii][Tt]/){
  hit_flag=1
}
val && token && (/ER/ || /[Ii][Nn]/){
  if(val){
    in_er_token=1
    };
  next}
!in_er_token{
  val=val?val ORS $0:$0
}
END{
  if(val && hit_flag){
    print val}
}
'   Input_file

Upvotes: 2

James Brown
James Brown

Reputation: 37404

Using GNU awk with RT:

$ awk 'BEGIN{RS="(ER|IN)"}NR==1{rt=RT}{ORS=RT}/\nhit/{print (NR==2?rt:"")$0}' file
ER 123 hit 456
abc
hit
ghi
ER 678 xxx 901
xyz
hit
xyz

Explained:

$ awk '
BEGIN { RS="(ER|IN)" }      # record separator is ER or IN
NR==1 { rt=RT }             # special handling it hit is in the second record
{ ORS=RT }                  # set matched RS as ORS
/\nhit/ {                   # hit in the record
    print (NR==2?rt:"") $0  # output with special handling for NR==2
}' file

The definition for ER and IN and hit could be tighter. Mind that when you implement it for your actual needs.

Upvotes: 2

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Awk solution:

awk '/^(ER|IN)\>/{
         if (f) { if (r ~ /\<hit\>/) print head, r }
         f=1; head=$0; r=""; next
     }
     f{ r=r ORS $0 }' file

The output:

ER 123 hit 456 
abc
hit
ghi
ER 678 hit 901 
xyz
hit
xyz

Upvotes: 1

Related Questions