Reputation: 1003
I have written an AWK script to process a text file, and now need to extend it so the output from the processing takes data from another file, based on a field in the first file. Here is an example of what I mean;
File1.txt
abc123~17~yy~12345678
abc456~12~yy~23456789
abc789~34~zz~12345678
File2.txt
abc123~11~22~33~ABC-57
abc456~22~11~33~ABC-99
abc789~33~22~11~ABC-12
My current awk script extracts and processes each line from the File1.txt whose 4th field is '12345678', so it finds 2 lines.
I now want to extend this, so from the line I have found, say
abc123~xx~yy~12345678
we take the abc123 and search for that in File2.txt and print the 4th field of that line as well.
Eg. My awk script will search for a token in field 4 of File1.txt then print thata long with field 1, and field 4 of File2.txt for the line that relates to Field 1 from File1.txt
So if we are searching for 12345678, my output would be
12345678 abc123 ABC-57 17
12345678 abc789 ABC-12 34
(The 17 and 34 have come from field 2 in File1.txt).
In summary then, search for a string in Field 4 of File1.txt, find a line in File2.txt where Field 1 in File1.txt matches Field 1 in File1.txt. Then print
File.Field4 File1.Field1 File2.Field4 File1.Field2
I hope that is clear.
I tried to grep for the 'abc123' string in File2.txt then select the 4th field. This did not seem to work, and now I think an AWK array of File2.txt that indexes on field 1 and stores field 4 might do it.
I am not sure how to go about this though.
(Note, this is a stripped-down example of what I want to do, my real requirement has more data in the files).
Upvotes: 1
Views: 6036
Reputation: 1003
This looks to be the solution I wanted;
BEGIN { FS="~" } # Set the field separator.
FNR==NR && $4==s { # If we are in the first file and fourth field equals s
a[$1] # Create index of field one
field2[$1]=$2
next # Skip to next line
}
($1 in a) { # If field one in file2 is in index
print s,$1,$5,field2[$1] # Print v, field 1 and field 5
}
I think that is correct.
My understanding of the solution is this. First it processes File1 in the first block of code, and I can store the data I want in arrays.
It then processes File 2 in the second block of code conditionally on $1 being in array a. If it is, then output the data, and access the field2 array from File 1.
Problem solved, and my real AWK script works a treat.
Many thanks for the help.
Upvotes: 3
Reputation: 85775
This one liner will do the trick:
$ awk -F'~' -v s='12345678' 'FNR==NR&&$4==s{a[$1];next}($1 in a){print s,$1,$5}' file1 file2
12345678 abc123 ABC-57
12345678 abc789 ABC-12
Explanation:
We set the field separator as ~
using the -F
option and the value of the variable s
to the string we want to match using the -v
option.
As a script with some explanatory comments:
BEGIN { FS="~" } # Set the field separator.
FNR==NR && $4==s { # If we are in the first file and fourth field equals s
a[$1] # Create index of field one
next # Skip to next line
}
($1 in a) { # If field one in file2 is in index
print v,$1,$5 # Print v, field 1 and field 5
}
You would run this like awk -v '12345678' -f script.awk file1 file2
.
Upvotes: 3