emare
emare

Reputation: 83

match two files with awk and output selected fields

I want to compare two files delimited with

; 

with the same field1, output field2 of file1 and field2 field1 of file2.

File1:

16003-Z/VG043;204352
16003/C3;100947
16003/C3;172973
16003/PAB4L;62245
16003;100530
16003;101691
16003;144786

File2:

16003-Z/VG043;568E;0540575;2.59
16003/C3;568E;0000340;2.53
16003/PAB4L;568H;0606738;9.74
16003;568E;0000339;0.71
16003TN9/C3;568E;0042261;3.29

Desired output:

204352;568E;16003-Z/VG043
100947;568E;16003/C3
172973;568E;16003/C3
62245;568H;16003/PAB4L
100530;568E;16003
101691;568E;16003
144786;568E;16003

My try:

awk -F\, '{FS=";"} NR==FNR {a[$1]; next} ($1) in a{ print a[$2]";"$2";"$3}' File1 File2 > Output

The above is not working probably because awk is still obscure to me. The problem is what is driving the output? what $1, $2, etc are referred to what? The a[$2] in my intention is the field2 of file 1....but it is not...

What I get is:

;204352;16003-Z/VG043
;100947;16003/C3
;172973;16003/C3
;62245;16003/PAB4L
;100530;16003
;101691;16003
;144786;16003

thanks for helping

Upvotes: 3

Views: 50

Answers (2)

Rafael
Rafael

Reputation: 7746

This approach reads a file file_1.txt by first into an associative array table. (This is done to associate ids / values across files.) Then, looping over the 2nd file file_2.txt, I print the values in table that match the id field of this file along with the current value:

BEGIN {
    FS=OFS=";"
    while (getline < first)
        table[$1] = $2 FS table[$1]
}

$1 in table {
    len = split(table[$1], parts)
    for (i=1; i<len; i++)
        print parts[i], $2, $1
}

$ awk -v first=file_1.txt -f script.awk file_2.txt

204352;568E;16003-Z/VG043
172973;568E;16003/C3
100947;568E;16003/C3
62245;568H;16003/PAB4L
144786;568E;16003
101691;568E;16003
100530;568E;16003

Upvotes: 1

kvantour
kvantour

Reputation: 26471

This might be what you are after:

awk -F";" '(NR==FNR) { a[$1] = ($1 in a ? a[$1] FS : "") $2; next }
           ($1 in a) { split(a[$1],b); for(i in b) print b[i] FS $2 FS $1 }' file1 file2

This outputs:

204352;568E;16003-Z/VG043
100947;568E;16003/C3
172973;568E;16003/C3
62245;568H;16003/PAB4L
100530;568E;16003
101691;568E;16003
144786;568E;16003

Upvotes: 2

Related Questions