user1691717
user1691717

Reputation: 223

Merge files based on common column (but containing different number of records) and put zeros for uncommon records

I am stuck up at a very intriguing part of my Perl code, where I have to merge multiple files based on a common column, but this common column contains different number of records, i.e some records may exist in one file, but not in other and so on.

For example : Here are my three files :

FileA.txt

ID  Value
 1   45
 2   56
 3   23

FileB.txt

ID  Value
 2   57
 3   65
 5   32

FileC.txt

ID  Value
 1   21
 3   68
 4   42

My output should be a combined table as follows :

ID  ValueA  ValueB  ValueC
 1   45       0      21
 2   56       57     0
 3   23       65     68
 4    0       0      42
 5    0       32     0

I tried using paste but that just pastes the lists side by side, without considering common columns.

How should I go about it ?

Any suggestions will be much appreciated.

Upvotes: 0

Views: 974

Answers (2)

choroba
choroba

Reputation: 241808

Use a hash of hashes to remember the partial tables. The main key is the ID, the key of the inner hash is the file.

#!/usr/bin/perl
use warnings;
use strict;
use feature qw(say);

use Data::Dumper;

my %table;
for my $letter (qw(A B C)) {
    open my $IN, '<', "File$letter.txt" or die "Cannot open: $!";
    <$IN>; # Skip the header
    while (<$IN>) {
        my ($id, $value) = split;
        $table{$id}{$letter} = $value;
    }
}

say "ID\tValueA\tValueB\tValueC";
for my $id (keys %table) {
    say $id, join"\t", q(), map $table{$id}{$_} // 0, qw(A B C);
}

Upvotes: 1

Miguel Prz
Miguel Prz

Reputation: 13792

You can initialize a hash (with the ID as the key), and read each file. When you are reading the FileA.txt, put something like this ($id and $value are the values you get from file):

$my_hash{$id} = {VALUE_A=>$value, VALUE_B=>0, VALUE_C=>0};

when reading FileB.txt, for each line:

if( exists $my_hash{$id} ) {
  $my_hash{$id}->{VALUE_B=>$value};
}
else {
   $my_hash{$id} = {VALUE_A=>0, VALUE_B=>$value, VALUE_C=>0};
}

quite similar for FileC.txt:

if( exists $my_hash{$id} ) {
  $my_hash{$id}->{VALUE_C=>$value};
}
else {
   $my_hash{$id} = {VALUE_A=>0, VALUE_B=>0, VALUE_C=>$value};
}

and finally, you have all your data in %my_hash

Upvotes: 0

Related Questions