Victor.H
Victor.H

Reputation: 157

is it possible using two keys and values to merge multi files?

I try to merge 3 files
File 1: 4 columns tab-delimited file

ID   Column_1    Column_2     Column_3    
 A     100       100001         X   
 B     100        99999         Y     
 C     100        88888         Z    
 D      99       100001         Y   
 E      99        88888         Z       

File 2: 3 columns tab-delimited file

  Column_4    Column_5     Column_6    
  100       100001         X   
  100        99999         Y     
  100        88888         Z    
   99       100001         Y   
   99        88888         Z 

File 3: 4 columns tab-delimited file

 Column_7    Column_8     Column_9   Column_10   
    100       120000        100       100001
    100        66666        100        99999
    100        77777        100        88888
     99        100000        99       100001
     99        44444         99        88888  

I want to make a merge file

 ID  Column_1   Column_2   Column_3  Column_6  Column_7  Column_8  
 A     100       100001         X       X       100       120000 
 B     100        99999         Y       Y       100        66666
 C     100        88888         Z       Z       100        77777
 D      99       100001         Y       Y        99       100000   
 E      99        88888         Z       Z        99        44444  

I try to use hash approach depending on column 1 and 2. But I just found out that I have two keys and many values. How can I use hash to parse these files?

Upvotes: 2

Views: 70

Answers (1)

Stefan Becker
Stefan Becker

Reputation: 5962

You are on the right path with a hash, you just have to calculate the key from the columns for each table. Di-section of the solution:

  • read each row from the 3 TSV files using Text::CSV
    • for each row call the table-specific code ref
    • extract the 2 columns that are needed for the key and concatenate them to the key string $key
    • for the first table: save key in the array @order, i.e. it defines the order of the output table
    • in the hash %table under key $key: push the columns that go from this table to the final table onto an array ref
  • loop over @order
    • fetch the array ref under $key from hash %table
    • dump row as TSV using Text::CSV to STDOUT
#!/usr/bin/perl
use warnings;
use strict;
use autodie;

use Text::CSV;

my $csv = Text::CSV->new({
    binary   => 1,
    eol      => "\n",
    sep_char => "\t",
}) or die "CSV creation\n";

sub read_file($$) {
    my($file, $code) = @_;
    open(my $fh, '<', $file);
    while (my $row = $csv->getline( $fh )) {
        $code->($row);
    }
    $csv->eof or $csv->error_diag();
    close($fh);
}

# Output table + row order
my %table;
my @order;

# Table 1
read_file($ARGV[0], sub {
    my($row) = @_;

    #print "ROW 1 @{ $row }\n";
    my($col1, $col2) = @{ $row }[1,2];

    # column_1, column_2 define key
    my $key = "${col1}${col2}";
    #print "KEY 1 ${key}\n";

    # table 1 defines order
    push(@order, $key);
    # ID, column_1, column_2, column_3 from table 1
    $table{$key} = $row;
});

# Table 2
read_file($ARGV[1], sub {
    my($row) = @_;

    #print "ROW 2 @{ $row }\n";
    my($col4, $col5, $col6) = @{ $row };

    # column_4, column_5 define key
    my $key = "${col4}${col5}";
    #print "KEY 2 ${key}\n";

    # column_6 from table 2
    push(@{ $table{$key} }, $col6);
});

# Table 3
read_file($ARGV[2], sub {
    my($row) = @_;

    #print "ROW 3 @{ $row }\n";
    my($col7, $col8, $col9, $col10) = @{ $row };

    # column_7, column_10 define key
    my $key = "${col7}${col10}";
    #print "KEY 3 ${key}\n";

    # column_7, column_8 from table 2
    push(@{ $table{$key} }, $col7, $col8);
});

foreach my $key (@order) {
    $csv->print(\*STDOUT, $table{$key});
}

exit 0;

Test run:

$ perl dummy.pl dummy1.txt dummy2.txt dummy3.txt 
A       100     100001  X       X       100     120000
B       100     99999   Y       Y       100     66666
C       100     88888   Z       Z       100     77777
D       99      100001  Y       Y       99      100000
E       99      88888   Z       Z       99      44444

Upvotes: 2

Related Questions