Arthur Cheuk
Arthur Cheuk

Reputation: 135

Looks for patterns across different lines

I have a file like this (test.txt):

abc
12
34
def
56
abc
ghi
78
def
90

And I would like to search the 78 which is enclosed by "abc\nghi" and "def". Currently, I know I can do this by:

cat test.txt | awk '/abc/,/def/' | awk '/ghi/,'/def/'

Is there any better way?

Upvotes: 1

Views: 99

Answers (5)

ghoti
ghoti

Reputation: 46826

You could do this with sed. It's not ideal in that it doesn't actually understand records, but it might work for you...

sed -Ene 'H;${x;s/.*\nabc\nghi\n([0-9]+)\ndef\n.*/\1/;p;}' input.txt

Here's what's basically going on:

  • H - appends the current line to sed's "hold space"
  • ${ - specifies the start of a series of commands that will be run once we come to the end of the file
  • x - swaps the hold space with the pattern space, so that future substitutions will work on what was stored using H
  • s/../../ - analyses the pattern space (which is now multi-line), capturing the data specified in your question, replacing the entire pattern space with the bracketed expression...
  • p - prints the result.

One important factor here is that the regular expression is ERE, so the -E option is important. If your version of sed uses some other option to enable support for ERE, then use that option instead.

Another consideration is that the regex above assumes Unix-style line endings. If you try to process a text file that was generated on DOS or Windows, the regex may need to be a little different.

Upvotes: 0

karakfa
karakfa

Reputation: 67467

grep alternative

$ grep -Pazo '(?s)(?<=abc\nghi)(.*)(?=def)' file

but I think awk will be better

Upvotes: 0

kvantour
kvantour

Reputation: 26471

This is not really clean, but you can redefine your record separator as a regular expression to be abc\nghi\n|\ndef. This however creates multiple records, and you need to keep track which ones are between the correct ones. With awk you can check which RS was found using RT.

awk 'BEGIN{RS="abc\nghi\n|\ndef"}
     (RT~/abc/){s=1}
     (s==1)&&(RT~/def/){print $0}
     {s=0}' file

This does :

  • set RS to abc\nghi\n or \ndef.
  • check if the record is found, if RT contains abc you found the first one.
  • if you found the first one and the next RT contains def, then print.

Upvotes: 0

Sundeep
Sundeep

Reputation: 23667

One way is to use flags

$ awk '/ghi/ && p~/abc/{f=1} f; /def/{f=0} {p=$0}' test.txt
ghi
78
def
  • {p=$0} this will save input line for future use
  • /ghi/ && p~/abc/{f=1} set flag if current line contains ghi and previous line contains abc
  • f; print input record as long as flag is set
  • /def/{f=0} clear the flag if line contains def


If you only want the lines between these two boundaries

$ awk '/ghi/ && p~/abc/{f=1; next} /def/{f=0} f; {p=$0}' ip.txt
78
$ awk '/12/ && p~/abc/{f=1; next} /def/{f=0} f; {p=$0}' ip.txt
34

See also How to select lines between two patterns?

Upvotes: 2

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

awk solution:

awk '/ghi/ && r=="abc"{ f=1; n=NR+1 }f && NR==n{ v=$0 }v && NR==n+1{ print v }{ r=$0 }' file

The output:

78

Bonus GNU awk approach:

awk -v RS= 'match($0,/\nabc\nghi\n(.+)\ndef/,a){ print a[1] }' file

Upvotes: -1

Related Questions