rdperez
rdperez

Reputation: 1

Comparing 2 files using awk and print if matches

I have two files and I want to compare the column 1 of File1 and column 10 of File2 and should print if it matches. I used this command but it only prints the last line of File2.

awk 'BEGIN{FS=OFS="|"}NR==FNR{a[$10]=$0;next}$1 in a {print a[$1],$0}' File2 File1

File1:

003502|COMMUNICATE|Chat|MEGAMOBILE
003502|COMMUNICATE|News - Headlines|MEGAMOBILE
003502|Entertainment|Promos|MEGAMOBILE
003502|ENTERTAINMENT|Promos|MEGAMOBILE
003502|INFORMATION||MEGAMOBILE

File2:

1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1

Desired Output:

1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|COMMUNICATE|Chat|MEGAMOBILE
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|COMMUNICATE|News - Headlines|MEGAMOBILE
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|Entertainment|Promos|MEGAMOBILE
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|ENTERTAINMENT|Promos|MEGAMOBILE
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|INFORMATION||MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|COMMUNICATE|Chat|MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|COMMUNICATE|News - Headlines|MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|Entertainment|Promos|MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|ENTERTAINMENT|Promos|MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|INFORMATION||MEGAMOBILE

Upvotes: 0

Views: 89

Answers (2)

Corentin Limier
Corentin Limier

Reputation: 5016

There is a special bash command to do this job : [join][1]

I suggest you to use it instead of awk as it would be more memory efficient.

As @EdMorton said:

join requires both input files to be sorted on the join field

join -t"|" -1 10 -2 1 <(sort -t"|" -k10 -n file2) <(sort -t"|" -k1 -n file1)

Gives

003502|1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|0|1|COMMUNICATE|Chat|MEGAMOBILE
003502|1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|0|1|COMMUNICATE|News - Headlines|MEGAMOBILE
003502|1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|0|1|ENTERTAINMENT|Promos|MEGAMO
003502|1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|0|1|Entertainment|Promos|MEGAMOBILE
003502|1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|0|1|COMMUNICATE|Chat|MEGAMOBILE
003502|1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|0|1|COMMUNICATE|News - Headlines|MEGAMOBILE
003502|1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|0|1|ENTERTAINMENT|Promos|MEGAMO
003502|1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|0|1|Entertainment|Promos|MEGAMOBILE

Upvotes: 3

kvantour
kvantour

Reputation: 26551

Since you have duplicate keys, you should keep track of those.

awk 'BEGIN{FS=OFS="|"}
     (NR==FNR) { c[$1]++; a[$1,c[$1]]=$0; next }
     ($10 in c) { for(i=1;i<=c[$10];++i) print $0,a[$10,i] }' file1 file2

In the above, the array c keeps track of how many times we encountered key $1. The entries are then stored in the array a indexed by $1 and the sequence number c[$1]. When reading file2 we check if the key $10 is in the original array c, if so, we process all the stored values in order.

Also, because of the expected output, we had to revert the file order.

Upvotes: 2

Related Questions