Reputation: 159
I have a list like this:
TAGDESCRIPTIONS example
TAGS tmp
TAGS line
TAGDESCRIPTIONS bar
TAGS com
TAGS foo
What is the right command in awk or grep to have it match the TAGDESCRIPTIONS
line after getting a match for foo
? So when searching for foo
, it prints bar
.
Upvotes: 0
Views: 93
Reputation: 52112
Grep isn't suitable for this as it is mainly for filtering specific lines, but you ask about relations across different lines. Grep can be coerced to doing some things across multiple lines by (ab)using the -z
flag, which expects null byte separated lines, but it's usually not pretty.
Awk1 allows for a simple solution:
$ awk 'BEGIN{RS="TAGDESCRIPTIONS"}/foo/{print $1}' infile
bar
This sets the record separator RS
to TAGDESCRIPTIONS
, so the input is interpreted as three records (\n
stands for a newline):
<empty record>
example\nTAGS tmp\nTAGS line\n
bar\nTAGS com\nTAGS foo\n
The first one is empty because the file starts with a record separator.
For each record, what we have up to the first newline is the tag description. What we say with
/foo/{print $1}
is this: if the record matches foo
, print the first field of the record (the description).
This isn't bomb proof at all. If the description consists of multiple words, it only prints the first one. If the description instead of the tag matches, it's a false positive. If the record contains foobar
but not bar
, it will still match.
This input example would throw off the simple solution:
TAGDESCRIPTIONS foo
TAGS blah
TAGDESCRIPTIONS example
TAGS tmp
TAGS line
TAGS foobar
TAGS barfoo
TAGDESCRIPTIONS bar and more words
TAGS com
TAGS foo
There is a tag description with foo
, tags containing foo
and a tag description with multiple words.
We can fix all that by splitting the records at newlines, then comparing each element except the description to the search string:
awk '
BEGIN { RS = "TAGDESCRIPTIONS *" }
{
# Split record at newlines, store in arr
split($0, arr, "\n")
# Skip first element (description), compare to 'foo'
for (i = 2; i <= length(arr); ++i) {
if (arr[i] ~ " +foo$") {
# Matches - print description
print arr[1]
# No need to look at the rest of the record
break
}
}
}' infile
resulting in
bar and more words
1 GNU awk, to be precise, due to the multi character record separator and the length
function.
Upvotes: 1
Reputation:
I'll give it a shot.
I don't know grep or awk, but the below is just a raw regex
that uses PCRE style lookahead and an inline modifier group.
(?ms)^TAGDESCRIPTIONS\s+(\w*)\s+(?:(?!^TAGDESCRIPTIONS).)+^TAGS\s+foo
Expanded
(?ms)
^ TAGDESCRIPTIONS \s+
( \w* ) # (1)
\s+
(?:
(?! ^ TAGDESCRIPTIONS )
.
)+
^ TAGS \s+ foo
The bar
word is in capture group 1. Flesh out the regex as you need.
Output
** Grp 0 - ( pos 68 , len 83 )
TAGDESCRIPTIONS bar
TAGS com
TAGS foo
** Grp 1 - ( pos 84 , len 3 )
bar
Upvotes: 0