Nerdio
Nerdio

Reputation: 1003

AWK script to process one file and read another

I have written an AWK script to process a text file, and now need to extend it so the output from the processing takes data from another file, based on a field in the first file. Here is an example of what I mean;

File1.txt

abc123~17~yy~12345678
abc456~12~yy~23456789
abc789~34~zz~12345678

File2.txt

abc123~11~22~33~ABC-57
abc456~22~11~33~ABC-99
abc789~33~22~11~ABC-12

My current awk script extracts and processes each line from the File1.txt whose 4th field is '12345678', so it finds 2 lines.

I now want to extend this, so from the line I have found, say

abc123~xx~yy~12345678

we take the abc123 and search for that in File2.txt and print the 4th field of that line as well.

Eg. My awk script will search for a token in field 4 of File1.txt then print thata long with field 1, and field 4 of File2.txt for the line that relates to Field 1 from File1.txt

So if we are searching for 12345678, my output would be

12345678 abc123 ABC-57 17
12345678 abc789 ABC-12 34

(The 17 and 34 have come from field 2 in File1.txt).

In summary then, search for a string in Field 4 of File1.txt, find a line in File2.txt where Field 1 in File1.txt matches Field 1 in File1.txt. Then print

File.Field4 File1.Field1 File2.Field4 File1.Field2

I hope that is clear.

I tried to grep for the 'abc123' string in File2.txt then select the 4th field. This did not seem to work, and now I think an AWK array of File2.txt that indexes on field 1 and stores field 4 might do it.

I am not sure how to go about this though.

(Note, this is a stripped-down example of what I want to do, my real requirement has more data in the files).

Upvotes: 1

Views: 6036

Answers (2)

Nerdio
Nerdio

Reputation: 1003

This looks to be the solution I wanted;

BEGIN { FS="~" }               # Set the field separator. 
FNR==NR && $4==s {             # If we are in the first file and fourth field equals s 
    a[$1]                      # Create index of field one
    field2[$1]=$2

    next                       # Skip to next line
}
($1 in a) {                    # If field one in file2 is in index
    print s,$1,$5,field2[$1]   # Print v, field 1 and field 5 
}

I think that is correct.

My understanding of the solution is this. First it processes File1 in the first block of code, and I can store the data I want in arrays.

It then processes File 2 in the second block of code conditionally on $1 being in array a. If it is, then output the data, and access the field2 array from File 1.

Problem solved, and my real AWK script works a treat.

Many thanks for the help.

Upvotes: 3

Chris Seymour
Chris Seymour

Reputation: 85775

This one liner will do the trick:

$ awk -F'~' -v s='12345678' 'FNR==NR&&$4==s{a[$1];next}($1 in a){print s,$1,$5}' file1 file2
12345678 abc123 ABC-57
12345678 abc789 ABC-12

Explanation:

We set the field separator as ~ using the -F option and the value of the variable s to the string we want to match using the -v option.

As a script with some explanatory comments:

BEGIN { FS="~" }    # Set the field separator. 
FNR==NR && $4==s {  # If we are in the first file and fourth field equals s 
    a[$1]           # Create index of field one
    next            # Skip to next line
}
($1 in a) {         # If field one in file2 is in index
    print v,$1,$5   # Print v, field 1 and field 5 
}

You would run this like awk -v '12345678' -f script.awk file1 file2.

Upvotes: 3

Related Questions