Reputation: 159

How to search for the first match of a second keyword before match of first keyword using awk or grep?

I have a list like this:

TAGDESCRIPTIONS example
TAGS            tmp
TAGS            line
TAGDESCRIPTIONS bar
TAGS            com                      
TAGS            foo

What is the right command in awk or grep to have it match the TAGDESCRIPTIONS line after getting a match for foo? So when searching for foo, it prints bar.

Upvotes: 0

Answers (3)

Benjamin W.

Reputation: 52112

Grep isn't suitable for this as it is mainly for filtering specific lines, but you ask about relations across different lines. Grep can be coerced to doing some things across multiple lines by (ab)using the -z flag, which expects null byte separated lines, but it's usually not pretty.

Awk¹ allows for a simple solution:

$ awk 'BEGIN{RS="TAGDESCRIPTIONS"}/foo/{print $1}' infile
bar

This sets the record separator RS to TAGDESCRIPTIONS, so the input is interpreted as three records (\n stands for a newline):

<empty record>
 example\nTAGS            tmp\nTAGS            line\n
 bar\nTAGS            com\nTAGS            foo\n

The first one is empty because the file starts with a record separator.

For each record, what we have up to the first newline is the tag description. What we say with

/foo/{print $1}

is this: if the record matches foo, print the first field of the record (the description).

This isn't bomb proof at all. If the description consists of multiple words, it only prints the first one. If the description instead of the tag matches, it's a false positive. If the record contains foobar but not bar, it will still match.

This input example would throw off the simple solution:

TAGDESCRIPTIONS foo
TAGS            blah
TAGDESCRIPTIONS example
TAGS            tmp
TAGS            line
TAGS            foobar
TAGS            barfoo
TAGDESCRIPTIONS bar and more words
TAGS            com
TAGS            foo

There is a tag description with foo, tags containing foo and a tag description with multiple words.

We can fix all that by splitting the records at newlines, then comparing each element except the description to the search string:

awk '
BEGIN { RS = "TAGDESCRIPTIONS *" }

{
    # Split record at newlines, store in arr
    split($0, arr, "\n")

    # Skip first element (description), compare to 'foo'
    for (i = 2; i <= length(arr); ++i) {
        if (arr[i] ~ " +foo$") {

            # Matches - print description
            print arr[1]

            # No need to look at the rest of the record
            break
        }
    }
}' infile

resulting in

bar and more words

¹ GNU awk, to be precise, due to the multi character record separator and the length function.

Upvotes: 1

Ed Morton

Reputation: 203209

$ awk '/TAGDESCRIPTIONS/{d=$2} /foo/{print d}' file
bar

Upvotes: 2

user557597

Reputation:

I'll give it a shot.
I don't know grep or awk, but the below is just a raw regex
that uses PCRE style lookahead and an inline modifier group.

(?ms)^TAGDESCRIPTIONS\s+(\w*)\s+(?:(?!^TAGDESCRIPTIONS).)+^TAGS\s+foo

Expanded

 (?ms)
 ^ TAGDESCRIPTIONS \s+ 
 ( \w* )                       # (1)
 \s+ 
 (?:
      (?! ^ TAGDESCRIPTIONS )
      . 
 )+
 ^ TAGS \s+ foo

The bar word is in capture group 1. Flesh out the regex as you need.

Output

 **  Grp 0 -  ( pos 68 , len 83 ) 
TAGDESCRIPTIONS bar
TAGS            com                      
TAGS            foo  
 **  Grp 1 -  ( pos 84 , len 3 ) 
bar

Upvotes: 0

How to search for the first match of a second keyword before match of first keyword using awk or grep?

Answers (3)

Related Questions