Reputation: 1
I have two files and I want to compare the column 1 of File1 and column 10 of File2 and should print if it matches. I used this command but it only prints the last line of File2.
awk 'BEGIN{FS=OFS="|"}NR==FNR{a[$10]=$0;next}$1 in a {print a[$1],$0}' File2 File1
File1:
003502|COMMUNICATE|Chat|MEGAMOBILE
003502|COMMUNICATE|News - Headlines|MEGAMOBILE
003502|Entertainment|Promos|MEGAMOBILE
003502|ENTERTAINMENT|Promos|MEGAMOBILE
003502|INFORMATION||MEGAMOBILE
File2:
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1
Desired Output:
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|COMMUNICATE|Chat|MEGAMOBILE
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|COMMUNICATE|News - Headlines|MEGAMOBILE
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|Entertainment|Promos|MEGAMOBILE
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|ENTERTAINMENT|Promos|MEGAMOBILE
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|INFORMATION||MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|COMMUNICATE|Chat|MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|COMMUNICATE|News - Headlines|MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|Entertainment|Promos|MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|ENTERTAINMENT|Promos|MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|INFORMATION||MEGAMOBILE
Upvotes: 0
Views: 89
Reputation: 5016
There is a special bash command to do this job : [join][1]
I suggest you to use it instead of awk as it would be more memory efficient.
As @EdMorton said:
join requires both input files to be sorted on the join field
join -t"|" -1 10 -2 1 <(sort -t"|" -k10 -n file2) <(sort -t"|" -k1 -n file1)
Gives
003502|1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|0|1|COMMUNICATE|Chat|MEGAMOBILE
003502|1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|0|1|COMMUNICATE|News - Headlines|MEGAMOBILE
003502|1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|0|1|ENTERTAINMENT|Promos|MEGAMO
003502|1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|0|1|Entertainment|Promos|MEGAMOBILE
003502|1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|0|1|COMMUNICATE|Chat|MEGAMOBILE
003502|1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|0|1|COMMUNICATE|News - Headlines|MEGAMOBILE
003502|1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|0|1|ENTERTAINMENT|Promos|MEGAMO
003502|1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|0|1|Entertainment|Promos|MEGAMOBILE
Upvotes: 3
Reputation: 26551
Since you have duplicate keys, you should keep track of those.
awk 'BEGIN{FS=OFS="|"}
(NR==FNR) { c[$1]++; a[$1,c[$1]]=$0; next }
($10 in c) { for(i=1;i<=c[$10];++i) print $0,a[$10,i] }' file1 file2
In the above, the array c
keeps track of how many times we encountered key $1
. The entries are then stored in the array a
indexed by $1
and the sequence number c[$1]
. When reading file2
we check if the key $10
is in the original array c
, if so, we process all the stored values in order.
Also, because of the expected output, we had to revert the file order.
Upvotes: 2