Reputation: 151
What I want to do is simply search and print groups of 3 consecutive lines in the following file:
C30 1.86494717 7.48500210 9.88662475
O86 1.23405589 6.84423578 21.24967645
O88 5.28196032 8.12576842 21.24967645
O90 3.01950053 8.12576842 3.03566806
C32 8.01630633 7.48500210 15.95796089
O92 1.07505084 8.12576842 9.10700419
O94 7.22641001 8.12576842 15.17834032
O96 6.07185664 6.20346947 22.02929701
xxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxxx
O111 3.82376560 6.83952632 25.21182108
H29 3.45376598 7.57952642 25.95182118
H30 4.93376561 6.83952632 25.21182108
O112 2.46658853 6.91893543 28.05848681
H31 2.09658891 7.65893553 28.79848692
H32 3.57658854 6.91893543 28.05848681
O113 6.25457469 6.74244996 26.28735053
H33 5.88457507 7.48245006 27.02735064
H34 7.36457470 6.74244996 26.28735053
I want to find in this case the lines which follow this pattern "O" "H" "H":
Ox
Hx
Hx
I tried something with grep
but it didn't work properly.
Any suggestions?
Many thanks in advance.
Upvotes: 3
Views: 109
Reputation: 203995
awk '
{ k = substr($0,1,1) }
(k=="H") && (prevNR["H"]==(NR-1)) && (prevNR["O"]==(NR-2)) {
print prevRec["O"] ORS prevRec["H"] ORS $0
}
{ prevNR[k]=NR; prevRec[k]=$0 }
' file
O111 3.82376560 6.83952632 25.21182108
H29 3.45376598 7.57952642 25.95182118
H30 4.93376561 6.83952632 25.21182108
O112 2.46658853 6.91893543 28.05848681
H31 2.09658891 7.65893553 28.79848692
H32 3.57658854 6.91893543 28.05848681
O113 6.25457469 6.74244996 26.28735053
H33 5.88457507 7.48245006 27.02735064
H34 7.36457470 6.74244996 26.28735053
Upvotes: 2
Reputation: 2999
gawk -vRS='(^|\n)O[^\n]*\nH[^\n]*\nH[^\n]*' '{print RT}'
^
matches the beginning of the file, not the beginning of any line (this may be a dark corner).
RT
is the text that matched RS
.
You need GNU Awk for this; standard Awk doesn't allow regex record separators.
Upvotes: 3
Reputation: 42087
Using newer version of GNU grep
having -z
option to match multiline inputs :
$ grep -Pzo 'O[^\n]+\nH[^\n]+\nH[^\n]+' file.txt
O111 3.82376560 6.83952632 25.21182108
H29 3.45376598 7.57952642 25.95182118
H30 4.93376561 6.83952632 25.21182108
O112 2.46658853 6.91893543 28.05848681
H31 2.09658891 7.65893553 28.79848692
H32 3.57658854 6.91893543 28.05848681
O113 6.25457469 6.74244996 26.28735053
H33 5.88457507 7.48245006 27.02735064
H34 7.36457470 6.74244996 26.28735053
You can also use the -M
option of pcregrep
to match multiline inputs :
$ pcregrep -M 'O[^\n]+\nH[^\n]+\nH[^\n]+' file.txt
O111 3.82376560 6.83952632 25.21182108
H29 3.45376598 7.57952642 25.95182118
H30 4.93376561 6.83952632 25.21182108
O112 2.46658853 6.91893543 28.05848681
H31 2.09658891 7.65893553 28.79848692
H32 3.57658854 6.91893543 28.05848681
O113 6.25457469 6.74244996 26.28735053
H33 5.88457507 7.48245006 27.02735064
H34 7.36457470 6.74244996 26.28735053
Upvotes: 3
Reputation: 11216
If i understand what you want this sed should work
sed '/^O/{N;/\nH/{N;/\nH[^\n]*$/p}};d' file
O111 3.82376560 6.83952632 25.21182108
H29 3.45376598 7.57952642 25.95182118
H30 4.93376561 6.83952632 25.21182108
O112 2.46658853 6.91893543 28.05848681
H31 2.09658891 7.65893553 28.79848692
H32 3.57658854 6.91893543 28.05848681
O113 6.25457469 6.74244996 26.28735053
H33 5.88457507 7.48245006 27.02735064
H34 7.36457470 6.74244996 26.28735053
Edit
I messed up the above won't work if there is a multiple of two O
lines together.
Below will though although its quite a bit longer...
sed '/^O/{:1;N;/\nH/{N;/\nH[^\n]*$/p};/\nO[^\n]*/{s/.*\n//;b1}};d' file
Upvotes: 4
Reputation: 785541
You can use this awk
:
awk '/^O/ { oline=NR; a=$0; next }
/^H/ && oline && NR==(oline+1) { hline=NR; a=a RS $0; next }
/^H/ && hline && NR==(hline+1) {
print a ORS $0;
aline=hline=0
}' file
O111 3.82376560 6.83952632 25.21182108
H29 3.45376598 7.57952642 25.95182118
H30 4.93376561 6.83952632 25.21182108
O112 2.46658853 6.91893543 28.05848681
H31 2.09658891 7.65893553 28.79848692
H32 3.57658854 6.91893543 28.05848681
O113 6.25457469 6.74244996 26.28735053
H33 5.88457507 7.48245006 27.02735064
H34 7.36457470 6.74244996 26.28735053
Upvotes: 2