Reputation: 33
I am parsing plenty of files and searching for correspondences with awk. I am stuck searching for a way to find the file containing pattern1 and search pattern2 only in this file.
example:
file1:
text xyz 122e345a rxyc
abc 25b57790c
file 2:
text tio 36e79a89 opgb
abc b0894e35o
file 3:
text diowps aaaacc
abc 122e345a
What I want as result should be:
25b57790c
While the first pattern that I have is:
122e345a
The only solution I had for now was to do it in 2 steps:
FILE=$(awk '$3 == "122e345a" {print FILENAME}' * )
awk '$1 == "abc" {print $2}' $FILE
I can have a one liner like this one:
awk '$1 == "abc" {print $2}' $(awk '$3 == "122e345a" {print FILENAME}' * )
But I would like to avoid the double awk call, can't it be done in one single awk command?
Upvotes: 2
Views: 203
Reputation: 34554
Note: Updated to show complete match on the desired patterns; if the objective is to show partial matches then replace the search patterns accordingly:
partial matching: $3 ~ /122e345a/
$1 ~ /abc/
complete matching: $3 == "122e345a"
$1 == "abc"
Assumptions:
"122e345a"
, and if found then ..."abc"
, and if found then ..."abc"
)"122e345a"
appears first in the file, with the string "abc"
showing up either a) in the same line as the first string or b) in a subsequent line"abc"
shows up multiple times in a file (after string "122e345a"
is found), then each occurrence of string "abc"
will cause a print
command to be issuedOne possible awk
solution:
awk '
BEGIN { found = 0 }
$3 == "122e345a" { found = 1 }
(found == 1) && $1 == "abc" { print $2 }
' <file>
found=0
; since this is part of the BEGIN
block it is only performed at the beginning of processing a new file (ie, we're initializing found
)"122e345a"
is found in the 3rd field of a line then set found = 1
found
is set to 1
, and string "abc"
is found in the first field of a line, then print the second field of that lineNOTE: You can submit the awk script as a multi-line construct (above) or as a single line, eg:
awk 'BEGIN { found = 0 } $3 == "122e345a" { found = 1 } (found == 1) && $1 == "abc" { print $2 }' <file>
Using your sample files (file1/file2/file3
), and adding file4
as a copy of file1
with the lines switched:
$ cat file4
abc 25b57790c
text xyz 122e345a rxyc
$ for f in file*
do
echo "++++++++++++++ file : $f"
awk 'BEGIN { found = 0 } $3 == "122e345a" { found = 1 } (found == 1) && $1 == "abc" { print $2 }' $f
done
++++++++++++++ file : file1
25b57790c
++++++++++++++ file : file2
++++++++++++++ file : file3
++++++++++++++ file : file4
Notice that while file4
has lines that match both search strings, string "122e345a"
shows up after string "abc"
, which goes against one of the assumptions so file4
fails our search.
Upvotes: 1
Reputation: 15613
file != FILENAME { found = 0 }
$3 == a { found = 1; file = FILENAME }
found && $1 == b { print $2 }
or, for GNU awk
:
BEGINFILE { found = 0 }
$3 == a { found = 1 }
found && $1 == b { print $2 }
This is very similar to markp's solution (and makes similar assumptions), but may be run on any number of input files without the use of a shell loop:
$ awk -f script.awk a="122e345a" b="abc" file[123]
25b57790c
The script(s) also assumes that the patterns that you'd like to search for are actually fixed strings in specific columns (as indicated by the question).
Since there's no way of "rewinding" a file in awk
, you need to pass over the file twice if you want to find the second string before the first string. The code at the end of the question itself is a solution for that.
Alternatively, you may save the whole file in a variable and go through that once you find the first string (that solution not included here).
Upvotes: 3