luxfred
luxfred

Reputation: 33

awk: find matching pattern1 in file containg pattern2

I am parsing plenty of files and searching for correspondences with awk. I am stuck searching for a way to find the file containing pattern1 and search pattern2 only in this file.

example:

file1:  
text xyz 122e345a rxyc  
abc 25b57790c

file 2:  
text tio 36e79a89 opgb  
abc b0894e35o  

file 3:  
text diowps aaaacc  
abc 122e345a  

What I want as result should be:

25b57790c

While the first pattern that I have is:

122e345a

The only solution I had for now was to do it in 2 steps:

FILE=$(awk '$3 == "122e345a" {print FILENAME}' * )  
awk '$1 == "abc" {print $2}' $FILE

I can have a one liner like this one:

awk '$1 == "abc" {print $2}' $(awk '$3 == "122e345a" {print FILENAME}' * )

But I would like to avoid the double awk call, can't it be done in one single awk command?

Upvotes: 2

Views: 203

Answers (2)

markp-fuso
markp-fuso

Reputation: 34554

Note: Updated to show complete match on the desired patterns; if the objective is to show partial matches then replace the search patterns accordingly:

partial matching:  $3 ~ /122e345a/
                   $1 ~ /abc/

complete matching: $3 == "122e345a"
                   $1 == "abc"

Assumptions:

  • the first search consists of looking for a line where the third field is a complete match for the string "122e345a", and if found then ...
  • look for a line where the first field is a complete match for the string "abc", and if found then ...
  • print the contents of the second field (of the line that contains string "abc")
  • the string "122e345a" appears first in the file, with the string "abc" showing up either a) in the same line as the first string or b) in a subsequent line
  • if string "abc" shows up multiple times in a file (after string "122e345a" is found), then each occurrence of string "abc" will cause a print command to be issued

One possible awk solution:

awk '
BEGIN                            { found = 0 }
                $3 == "122e345a" { found = 1 }
(found == 1) && $1 == "abc"      { print $2  }
' <file>
  • set variable found=0; since this is part of the BEGIN block it is only performed at the beginning of processing a new file (ie, we're initializing found)
  • if the string "122e345a" is found in the 3rd field of a line then set found = 1
  • if our variable found is set to 1, and string "abc" is found in the first field of a line, then print the second field of that line

NOTE: You can submit the awk script as a multi-line construct (above) or as a single line, eg:

awk 'BEGIN { found = 0 } $3 == "122e345a" { found = 1 } (found == 1) && $1 == "abc" { print $2 }' <file>

Using your sample files (file1/file2/file3), and adding file4 as a copy of file1 with the lines switched:

$ cat file4
abc 25b57790c
text xyz 122e345a rxyc

$ for f in file*
do
    echo "++++++++++++++ file : $f"
    awk 'BEGIN { found = 0 } $3 == "122e345a" { found = 1 } (found == 1) && $1 == "abc" { print $2 }' $f
done

++++++++++++++ file : file1
25b57790c
++++++++++++++ file : file2
++++++++++++++ file : file3
++++++++++++++ file : file4

Notice that while file4 has lines that match both search strings, string "122e345a" shows up after string "abc", which goes against one of the assumptions so file4 fails our search.

Upvotes: 1

Kusalananda
Kusalananda

Reputation: 15613

file != FILENAME       { found = 0 }
         $3 == a       { found = 1; file = FILENAME }
found && $1 == b       { print $2  }

or, for GNU awk:

BEGINFILE              { found = 0 }
         $3 == a       { found = 1 }
found && $1 == b       { print $2  }

This is very similar to markp's solution (and makes similar assumptions), but may be run on any number of input files without the use of a shell loop:

$ awk -f script.awk a="122e345a" b="abc" file[123]
25b57790c

The script(s) also assumes that the patterns that you'd like to search for are actually fixed strings in specific columns (as indicated by the question).

Since there's no way of "rewinding" a file in awk, you need to pass over the file twice if you want to find the second string before the first string. The code at the end of the question itself is a solution for that.

Alternatively, you may save the whole file in a variable and go through that once you find the first string (that solution not included here).

Upvotes: 3

Related Questions