Reputation: 53
I have a text file that contains text blocks like this:
IN
hit
ER 123 hit 456
abc
hit
ghi
ER 789 hit 012
abc
ghi
IN 345
abc
def
ghi
ER 678 xxx 901
xyz
hit
xyz
IN
risk
in
Blocks can have any number of lines, but always start with line containing ER or IN.
Using awk how can I select lines which are occurring between two similar marker patterns?
1) There may be multiple sections marked with these patterns.
2) One of selected lines between patterns must contain another pattern (eg. hit)
3) Line with first pattern (eg. ER) should be included, line with second one (eg. ER|IN) should be excluded.
Expected output:
ER 123 hit 456
abc
hit
ghi
ER 678 xxx 901
xyz
hit
xyz
I've tried to achieve my goal with
awk '/ER/ {block=1} block {str=str sep $0; sep=RS} /ER|IN/ {block=0; if (str~/hit/) {print str} str=sep=""}'
but it gives me
ER abc hit ghi
ER 789 hit 012
EDIT: my example wasn't precise enough. EDIT2:
a) I try to find line matching pattern " ER " b) I search for nearest next line matching pattern " ER " or " IN " c) I want to print result only if my result contains at least one line matching pattern ".hit.", but it can't be first line. Result should include first line, but exclude last line, so:
ER 678 xxx 901
xyz
hit
xyz
should be printed, because there is one line matching hit in block between line matching " ER " and line matching " IN "
ER 789 hit 012
abc
ghi
shouldn't be printed, because there is no line matching hit in block between line matching " ER " and line matching " IN "
Upvotes: 3
Views: 140
Reputation: 133518
Could you please try following and let me know if this helps you.
awk '
/ER/ && val{
if(hit_flag){
print val};
val=hit_flag=token=in_er_token=""
}
/ER/ && !val{
val=$0;
token=1
next
}
val && token && (/[Hh][Ii][Tt]/){
hit_flag=1
}
val && token && (/ER/ || /[Ii][Nn]/){
if(val){
in_er_token=1
};
next}
!in_er_token{
val=val?val ORS $0:$0
}
END{
if(val && hit_flag){
print val}
}
' Input_file
Upvotes: 2
Reputation: 37404
Using GNU awk with RT
:
$ awk 'BEGIN{RS="(ER|IN)"}NR==1{rt=RT}{ORS=RT}/\nhit/{print (NR==2?rt:"")$0}' file
ER 123 hit 456
abc
hit
ghi
ER 678 xxx 901
xyz
hit
xyz
Explained:
$ awk '
BEGIN { RS="(ER|IN)" } # record separator is ER or IN
NR==1 { rt=RT } # special handling it hit is in the second record
{ ORS=RT } # set matched RS as ORS
/\nhit/ { # hit in the record
print (NR==2?rt:"") $0 # output with special handling for NR==2
}' file
The definition for ER
and IN
and hit
could be tighter. Mind that when you implement it for your actual needs.
Upvotes: 2
Reputation: 92854
Awk
solution:
awk '/^(ER|IN)\>/{
if (f) { if (r ~ /\<hit\>/) print head, r }
f=1; head=$0; r=""; next
}
f{ r=r ORS $0 }' file
The output:
ER 123 hit 456
abc
hit
ghi
ER 678 hit 901
xyz
hit
xyz
Upvotes: 1