Reputation: 103
I'm trying to join together multiple results stored in CSV files, with the first column as the index.
The problem is that I have a various number of files, and the index results can vary from file to file, with some having less results.
In Linux, what would be the easiest way to script the process to join all the files in one, with blank fields for empty results? I have experienced Linux join, paste and column with no luck. Mybe I should try with another tool, or maybe Python or Perl?
The data files look like this:
file1:
header1 header2 header3
result_A 10 11 12
result_B 13 14 15
result_C 16 17 18
result_D 19 20 21
result_E 22 23 24
result_F 25 26 27
file2:
header1 header2 header3
result_B 40 41 42
result_F 43 44 45
file3:
header1 header2 header3
result_C 60 61 62
result_D 63 64 65
result_F 66 67 68
And hopefully the results should look like this:
file1 file1 file1 file2 file2 file2 file3 file3 file3
header1 header2 header3 header1 header2 header3 header1 header2 header3
result_A 10 11 12
result_B 13 14 15 40 41 42
result_C 16 17 18 60 61 62
result_D 19 20 21 63 64 65
result_E 22 23 24
result_F 25 26 27 43 44 45 66 67 68
Upvotes: 2
Views: 2607
Reputation: 393389
UNIX join
should get you a long way:
join -a 1 -e '0' "-t " -j 1
<(sort <(join -a 1 -e '0' "-t " -j 1 <(sort file1) <(sort file2)))
<(sort file3)
(all on one line). Note that "-t "
has the TAB character within quotes. Enter it using ^V<Tab>
.
If you know the input is sorted, it would be better to use
join -a 1 -e '0' "-t " -j 1
<(join -a 1 -e '0' "-t " -j 1 file1 file2)
file3
(all on one line) prints:
id header1 header2 header3 header1 header2 header3 header1 header2 header3
result_A 10 11 12
result_B 13 14 15 40 41 42
result_C 16 17 18 60 61 62
result_D 19 20 21 63 64 65
result_E 22 23 24
result_F 25 26 27 43 44 45 66 67 68
Now, as you can see, on my Cygwin system -e '0'
apparently doesn't work as advertised. I'd suggest trying this on a different system though, as I don't imagine having uncovered such an essential bug in a standard UNIX utility.
Upvotes: 2