Reputation: 11
I have 2 big files.
file1 has 160 million lines with this format: id:email
file2 has 45 million lines with this format: id:hash
The problem is to find all equal ids and save those to a third file, with the format: email:hash
Tried something like:
awk -F':' 'NR==FNR{a[$1]=$2;next} {print a[$1]":"$2}' test1.in test2.in > res.in
But it's not working :(
Example file1:
9305718:[email protected]
59287478:[email protected]
file2:
21367509:e90100b1b668142ad33e58c17a614696ec04474c
9305718:d63fff1d21e1a04c066824dd2f83f3aeaa0edf6e
Desired result:
[email protected]:d63fff1d21e1a04c066824dd2f83f3aeaa0edf6e
Upvotes: 1
Views: 106
Reputation: 37424
In AWK (not considering the amount of resources you have available):
$ awk -F':' 'NR==FNR{a[$1]=$2;next} a[$1] {print a[$1]":"$2}' test1.in test2.in
[email protected] :d63fff1d21e1a04c066824dd2f83f3aeaa0edf6e
Upvotes: 0
Reputation: 88731
With GNU join and GNU bash:
join -t : -j 1 <(sort -t : -k1,1 file1) <(sort -t : -k1,1 file2) -o 1.2,2.2
Update:
join -t: <(sort file1) <(sort file2) -o 1.2,2.2
Upvotes: 1