FIJI
FIJI

Reputation: 9

Perl: Matching perl hash keys from two files and utilizing in new hash

Just started learning perl, and was wondering if anyone could provide suggestions, relevant examples or resources regarding a coding problem I'm having below.

So I have two data files with tab-delineated columns, similar to the example below.

File#1:

GeneID  ColA    ColB
Gene01  5   15
Gene02  4   8
Gene03  25  5

File#2:

GeneID  ColA    ColC
Gene01  12  3
Gene03  5   20
Gene05  22  40
Gene06  88  2

The actual files I'm using have >50 columns and rows, but are similar to what's above.

First, I want to input the files, establish variables holding the column names for each file, and establish hashes using the column 1 genes as keys and the concatenated values of the other 2 columns per key.

This way there is one key per one value in each row of the hash.

My trouble is the third hash %commongenes. I need to find the keys that are the same in both hashes and use just those keys, and their associated values in both files, in the third hash. In the above example, this would be the following key value pairs:

 File1:                  File2: 

 Gene01 5   15          Gene01  12  3
 Gene03 25  5           Gene03  5   20

I know the following if loop is incorrect, but the concatenation of columns from both files it what I'd like to have.

 if ($tmpArray1[0] eq $tmpArray2[0]){
 $commongenes{$tmpArray2[0]} = 
    $tmpArray1[1].':'.$tmpArray1[2].':'.$tmpArray2[1].':'.$tmpArray2[2];
 }

Here is the main body of the code below:

 #!/usr/bin/perl -w
 use strict;                    

 my $file1=$ARGV[0];          
 my $file2=$ARGV[1];

 open (FILE1, "<$file1") or die "Cannot open $file1 for processing!\n";        
 open (FILE2, "<$file2") or die "Cannot opent $file2 for processing!\n";      

 my @fileLine1=<FILE1>;   
 my @fileLine2=<FILE2>;   

 my %file1_allgenes=();        
 my %file2_allgenes=();        
 my %commongenes =();


 my ($file1_group0name, $file1_group1name, $file1_group2name)=('','','','');
 my ($file2_group0name, $file2_group1name, $file2_group2name)=('','','','');    

 for (my $i=0; $i<=$#fileLine1 && $i<=$#fileLine2; $i++) {   
 chomp($fileLine1[$i]);                                 
 chomp($fileLine2[$i]);                                 
 my @tmpArray1=split('\t',$fileLine1[$i]);               
 my @tmpArray2=split('\t',$fileLine2[$i]);                  

 if ($i==0) {                                 ## Column Names and/or Letters
    $file1_group0name=substr($tmpArray1[0],0,6);         
    $file1_group1name=substr($tmpArray1[1],0,4);         
    $file1_group2name=substr($tmpArray1[2],0,4);         
    $file2_group0name=substr($tmpArray2[0],0,6);         
    $file2_group1name=substr($tmpArray2[1],0,4);         
    $file2_group2name=substr($tmpArray2[2],0,4);         
 }
 if ($i!=0) {     ## Concatenated values in 3 separate hashes                                       
    if (! defined $file1_allgenes{$tmpArray1[0]}) {      
        $file1_allgenes{$tmpArray1[0]}=$tmpArray1[1].':'.$tmpArray1[2];  

    }                                       
    if (! defined $file2_allgenes{$tmpArray2[0]}) {     
        $file2_allgenes{$tmpArray2[0]}=$tmpArray2[1].':'.$tmpArray2[2];             

    }
    if ($tmpArray1[0] eq $tmpArray2[0]){
        $commongenes{$tmpArray2[0]} = 
    $tmpArray1[1].':'.$tmpArray1[2].':'.$tmpArray2[1].':'.$tmpArray2[2];
    }

  }
    my @commongenes = %commongenes;               
    print "@commongenes\n\n";              
  }

Any suggestions are most appreciated.

Upvotes: 0

Views: 182

Answers (1)

choroba
choroba

Reputation: 241868

Use a hash of arrays so you don't need to substr and concatenate the strings all the time.

#!/usr/bin/perl
use warnings;
use strict;

open my $F1, '<', 'file1' or die $!;
<$F1>;  # Skip the header.
my %h;
while (<$F1>) {
    my @cols = split;
    $h{ $cols[0] } = [ @cols[ 1 .. $#cols ] ];
}

my %common;

open my $F2, '<', 'file2' or die $!;
<$F2>;
while (<$F2>) {
    my @cols = split;
    $common{ $cols[0] } = [ @{ $h{ $cols[0] } }, @cols[ 1 .. $#cols ] ]
        if exists $h{ $cols[0] };
}

use Data::Dumper; print Dumper \%common;

Upvotes: 2

Related Questions