git
git

Reputation: 151

How to search groups of 3 lines with a certain pattern?

What I want to do is simply search and print groups of 3 consecutive lines in the following file:

C30                1.86494717          7.48500210          9.88662475
O86                1.23405589          6.84423578         21.24967645
O88                5.28196032          8.12576842         21.24967645
O90                3.01950053          8.12576842          3.03566806
C32                8.01630633          7.48500210         15.95796089
O92                1.07505084          8.12576842          9.10700419
O94                7.22641001          8.12576842         15.17834032
O96                6.07185664          6.20346947         22.02929701
xxx                xxxxxxxxxx          xxxxxxxxxx         xxxxxxxxxxx
O111               3.82376560          6.83952632         25.21182108
H29                3.45376598          7.57952642         25.95182118
H30                4.93376561          6.83952632         25.21182108
O112               2.46658853          6.91893543         28.05848681
H31                2.09658891          7.65893553         28.79848692
H32                3.57658854          6.91893543         28.05848681
O113               6.25457469          6.74244996         26.28735053
H33                5.88457507          7.48245006         27.02735064
H34                7.36457470          6.74244996         26.28735053

I want to find in this case the lines which follow this pattern "O" "H" "H":

    Ox               
    Hx  
    Hx

I tried something with grep but it didn't work properly.

Any suggestions?

Many thanks in advance.

Upvotes: 3

Views: 109

Answers (5)

Ed Morton
Ed Morton

Reputation: 203995

awk '
{ k = substr($0,1,1) }
(k=="H") && (prevNR["H"]==(NR-1)) && (prevNR["O"]==(NR-2)) {
    print prevRec["O"] ORS prevRec["H"] ORS $0
}
{ prevNR[k]=NR; prevRec[k]=$0 }
' file
O111               3.82376560          6.83952632         25.21182108
H29                3.45376598          7.57952642         25.95182118
H30                4.93376561          6.83952632         25.21182108
O112               2.46658853          6.91893543         28.05848681
H31                2.09658891          7.65893553         28.79848692
H32                3.57658854          6.91893543         28.05848681
O113               6.25457469          6.74244996         26.28735053
H33                5.88457507          7.48245006         27.02735064
H34                7.36457470          6.74244996         26.28735053

Upvotes: 2

hemflit
hemflit

Reputation: 2999

gawk -vRS='(^|\n)O[^\n]*\nH[^\n]*\nH[^\n]*' '{print RT}'

^ matches the beginning of the file, not the beginning of any line (this may be a dark corner).
RT is the text that matched RS.
You need GNU Awk for this; standard Awk doesn't allow regex record separators.

Upvotes: 3

heemayl
heemayl

Reputation: 42087

Using newer version of GNU grep having -z option to match multiline inputs :

$ grep -Pzo 'O[^\n]+\nH[^\n]+\nH[^\n]+' file.txt
O111               3.82376560          6.83952632         25.21182108
H29                3.45376598          7.57952642         25.95182118
H30                4.93376561          6.83952632         25.21182108
O112               2.46658853          6.91893543         28.05848681
H31                2.09658891          7.65893553         28.79848692
H32                3.57658854          6.91893543         28.05848681
O113               6.25457469          6.74244996         26.28735053
H33                5.88457507          7.48245006         27.02735064
H34                7.36457470          6.74244996         26.28735053

You can also use the -M option of pcregrep to match multiline inputs :

$ pcregrep -M 'O[^\n]+\nH[^\n]+\nH[^\n]+' file.txt 
O111               3.82376560          6.83952632         25.21182108
H29                3.45376598          7.57952642         25.95182118
H30                4.93376561          6.83952632         25.21182108
O112               2.46658853          6.91893543         28.05848681
H31                2.09658891          7.65893553         28.79848692
H32                3.57658854          6.91893543         28.05848681
O113               6.25457469          6.74244996         26.28735053
H33                5.88457507          7.48245006         27.02735064
H34                7.36457470          6.74244996         26.28735053

Upvotes: 3

123
123

Reputation: 11216

If i understand what you want this sed should work

sed '/^O/{N;/\nH/{N;/\nH[^\n]*$/p}};d' file

O111               3.82376560          6.83952632         25.21182108
H29                3.45376598          7.57952642         25.95182118
H30                4.93376561          6.83952632         25.21182108
O112               2.46658853          6.91893543         28.05848681
H31                2.09658891          7.65893553         28.79848692
H32                3.57658854          6.91893543         28.05848681
O113               6.25457469          6.74244996         26.28735053
H33                5.88457507          7.48245006         27.02735064
H34                7.36457470          6.74244996         26.28735053

Edit

I messed up the above won't work if there is a multiple of two O lines together.

Below will though although its quite a bit longer...

sed '/^O/{:1;N;/\nH/{N;/\nH[^\n]*$/p};/\nO[^\n]*/{s/.*\n//;b1}};d' file

Upvotes: 4

anubhava
anubhava

Reputation: 785541

You can use this awk:

awk '/^O/ { oline=NR; a=$0; next }
     /^H/ && oline && NR==(oline+1) { hline=NR; a=a RS $0; next }
     /^H/ && hline && NR==(hline+1) {
       print a ORS $0;
       aline=hline=0
}' file

O111               3.82376560          6.83952632         25.21182108
H29                3.45376598          7.57952642         25.95182118
H30                4.93376561          6.83952632         25.21182108
O112               2.46658853          6.91893543         28.05848681
H31                2.09658891          7.65893553         28.79848692
H32                3.57658854          6.91893543         28.05848681
O113               6.25457469          6.74244996         26.28735053
H33                5.88457507          7.48245006         27.02735064
H34                7.36457470          6.74244996         26.28735053

Upvotes: 2

Related Questions