Reputation: 12044
I'd like to merge two files according to the content of their 2nd columns.
File 1:
"4742" "209220_at" 2.60700394801826
"104" "209396_s_at" 2.60651442103297
"749" "202409_at" 2.59424724783704
"4168" "209875_s_at" 2.58773204877464
"3973" "1431_at" 2.52832098784342
"1826" "207201_s_at" 2.41685345240968
File2:
"653" "1431_at" 2.14595534191867
"1109" "207201_s_at" 2.13777517447307
"353" "212531_at" 2.12706340284672
"381" "206535_at" 2.11456707231618
"1846" "204534_at" 2.10919474441178
To have in the end:
"3973" "1431_at" 2.52832098784342 "653" "1431_at" 2.14595534191867
"1826" "207201_s_at" 2.41685345240968 "1109" "207201_s_at" 2.13777517447307
I have tried comm
, diff
, some obscure awk
one-liner without any success.
Any help much appreciated.
Ben
Upvotes: 5
Views: 7761
Reputation: 25599
awk 'FNR==NR{a[$2]=$0} NR>FNR && ($2 in a){ print $0,a[$2] } ' file2 file1
Upvotes: 0
Reputation: 246754
awk '
# store the first file, indexed by col2
NR==FNR {f1[$2] = $0; next}
# output only if file1 contains file2's col2
($2 in f1) {print f1[$2], $0}
' file1 file2
Upvotes: 2
Reputation: 42647
You can do that with a combination of the sort
and join
commands. The straightforward approach is
join -j2 <(sort -k2 file1) <(sort -k2 file2)
but that displays slightly differently than you're looking for. It just shows the common join field and then the remaining fields from each file
"1431_at" "3973" 2.52832098784342 "653" 2.14595534191867
"207201_s_at" "1826" 2.41685345240968 "1109" 2.13777517447307
If you need the format exactly as you showed, then you would need to tell join
to output in that manner
join -o 1.1,1.2,1.3,2.1,2.2,2.3 -j2 <(sort -k2 file1) <(sort -k2 file2)
where -o
accepts a list of FILENUM.FIELDNUM
specifiers.
Note that the <()
syntax I'm using isn't POSIX sh, so you should sort to a temporary file if you need POSIX sh syntax.
Upvotes: 5
Reputation: 46399
If the files are small, write a program in a scripting language (Perl, Python and Ruby are all good choices) that reads the first into a hash whose keys are the second column, then read through the second file and use hash lookups to fin out what (if anything) can be joined.
If the files are large then for each file swap the first and second columns, pass them through the unix sort utility, and then in a scripting language merge (plus column reorder) the two sorted files.
Upvotes: 0