Stefano_g
Stefano_g

Reputation: 321

awk - if values match then print file1 and file 2

I googled a lot my problem and tested different solutions, but none seem to work. I even used the same command in advance with success but now I do not manage to get my desired output.

I have file1

AAA;123456789A
BBB;123456789B
CCC;123456789C

And file2

1;2;3;CCC;pippo
1;2;3;AAA;pippo
1;2;3;BBB;pippo
1;2;3;*;pippo

My desired output is this:

1;2;3;CCC;pippo;CCC;123456789C
1;2;3;AAA;pippo;AAA;123456789A
1;2;3;BBB;pippo;BBB;123456789B

I tried with this command:

awk -F";" -v OFS=";" 'FNR == NR {a[$10]=$1; b[$20]=$2; next}($10 in a){ if(match(a[$10],$4)) print $0,a[$10],b[$20]}' file1 file2

But I get this output (only one entry, even with bigger files):

1;2;3;CCC;pippo;CCC;123456789C

What am I doing wrong? If it manages for one it should for all the other. Why is this not happening? Also why if I set a[$1]=$1 it doesn't work?
Thank you for helping! If possible could you explain the answer? So next time I won't have to ask for help!

EDIT: Sorry, I did not mention (since I wanted to keep the example minimal) that in file2 some fields are just "*". And I'd like to add an "else doesn't match do something".

Upvotes: 0

Views: 233

Answers (2)

karakfa
karakfa

Reputation: 67527

awk to the rescue!

$ awk 'BEGIN{FS=OFS=";"} 
     NR==FNR{a[$1]=$0;next} 
            {print $0,a[$4]}' file1 file2

1;2;3;CCC;pippo;CCC;123456789C
1;2;3;AAA;pippo;AAA;123456789A
1;2;3;BBB;pippo;BBB;123456789B

UPDATE: Based on the original input file it was only looking for exact match. If you want to skip the entries where there is no match, you need to qualify the print block with $4 in a

$ awk 'BEGIN{FS=OFS=";"} 
     NR==FNR{a[$1]=$0;next} 
     $4 in a{print $0,a[$4]}' file1 file2

Upvotes: 1

John Zwinck
John Zwinck

Reputation: 249454

join is made for this sort of thing:

$ join -t';' -1 4 -o1.{1..5} -o2.{1..2} <(sort -t';' -k4 file2) <(sort -t';' file1)

1;2;3;AAA;pippo;AAA;123456789A
1;2;3;BBB;pippo;BBB;123456789B
1;2;3;CCC;pippo;CCC;123456789C

The output is what you asked for except for the ordering of lines, which I assume isn't important. The -o options to join are needed because you want the full set of fields; you can try omitting it and you'll get the join field on the left a single time instead, which might also be fine.

Upvotes: 0

Related Questions