Reputation: 44678
I have two files, both in the same format -- two columns both containing a number, for example:
file 1
1.00 99
2.00 343
3.00 34
...
10.00 343
file 2
1.00 0.4
2.00 0.5
3.00 0.34
...
10.00 0.9
and i want to generate the following file (using, awk, bash perl):
1.00 99 0.4
2.00 343 0.5
3.00 34 0.34
...
10.00 343 0.9
thanks
Upvotes: 3
Views: 2345
Reputation: 360693
join file1 file2
Which assumes that the files are sorted on the join field. If they are not, you can do this:
join <(sort -V file1) <(sort -V file2)
Here's an AWK version (the sort
compensates for AWK's non-deterministic array ordering):
awk '{a[$1]=a[$1] FS $2} END {for (i in a) print i a[i]}' file1 file2 | sort -V
It seems shorter and more readable than the Perl answer.
In gawk
4, you can set the array traversal order:
awk 'BEGIN {PROCINFO["sorted_in"] = "@ind_num_asc"} {a[$1]=a[$1] FS $2} END {for (i in a) print i a[i]}' file1 file2
and you won't have to use the sort
utility. @ind_num_asc
is Index Numeric Ascending. See Controlling Array Traversal and Array Sorting and Using Predefined Array Scanning Orders with gawk.
Note that -V
(--version-sort
) in the sort
commands above requires GNU sort
from coreutils 7.0 or later. Thanks for @simlev pointing out that it should be used if available.
Upvotes: 8
Reputation: 25137
A Perl-solution
perl -anE 'push @{$h{$F[0]}}, $F[1]; END{ say "$_\t$h{$_}->[0]\t$h{$_}->[1]" for sort{$a<=>$b} keys %h }' file_1 file_2 > file_3
Ok, looking at the awk-oneliner this is shorter then my first try and it has the nicer output then the awk-oneliner and it doesn't use the 'pipe sort -n':
perl -anE '$h{$F[0]}="$h{$F[0]}\t$F[1]"; END{say "$_$h{$_}" for sort {$a<=>$b} keys %h}' file_1 file_2
And the one-liners behave different then the join-example if there are entries with no value in the second column in the first file.
Upvotes: 2
Reputation: 4107
You can do it with Alacon - command-line utility for Alasql database.
It works with Node.js, so you need to install Node.js and then Alasql package:
To join two data from tab-separated files you can use the following command:
> node alacon "SELECT * INTO TSV("main.txt") FROM TSV('data1.txt') data1
JOIN TSV('data2.txt') data2 USING [0]"
This is one very long line. In this example all files have data in "Sheet1" sheets.
Upvotes: 0