Reputation: 223
I am stuck up at a very intriguing part of my Perl code, where I have to merge multiple files based on a common column, but this common column contains different number of records, i.e some records may exist in one file, but not in other and so on.
For example : Here are my three files :
FileA.txt
ID Value
1 45
2 56
3 23
FileB.txt
ID Value
2 57
3 65
5 32
FileC.txt
ID Value
1 21
3 68
4 42
My output should be a combined table as follows :
ID ValueA ValueB ValueC
1 45 0 21
2 56 57 0
3 23 65 68
4 0 0 42
5 0 32 0
I tried using paste
but that just pastes the lists side by side, without considering common columns.
How should I go about it ?
Any suggestions will be much appreciated.
Upvotes: 0
Views: 974
Reputation: 241808
Use a hash of hashes to remember the partial tables. The main key is the ID, the key of the inner hash is the file.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw(say);
use Data::Dumper;
my %table;
for my $letter (qw(A B C)) {
open my $IN, '<', "File$letter.txt" or die "Cannot open: $!";
<$IN>; # Skip the header
while (<$IN>) {
my ($id, $value) = split;
$table{$id}{$letter} = $value;
}
}
say "ID\tValueA\tValueB\tValueC";
for my $id (keys %table) {
say $id, join"\t", q(), map $table{$id}{$_} // 0, qw(A B C);
}
Upvotes: 1
Reputation: 13792
You can initialize a hash (with the ID as the key), and read each file. When you are reading the FileA.txt, put something like this ($id and $value are the values you get from file):
$my_hash{$id} = {VALUE_A=>$value, VALUE_B=>0, VALUE_C=>0};
when reading FileB.txt, for each line:
if( exists $my_hash{$id} ) {
$my_hash{$id}->{VALUE_B=>$value};
}
else {
$my_hash{$id} = {VALUE_A=>0, VALUE_B=>$value, VALUE_C=>0};
}
quite similar for FileC.txt:
if( exists $my_hash{$id} ) {
$my_hash{$id}->{VALUE_C=>$value};
}
else {
$my_hash{$id} = {VALUE_A=>0, VALUE_B=>0, VALUE_C=>$value};
}
and finally, you have all your data in %my_hash
Upvotes: 0