Reputation: 9
Just started learning perl, and was wondering if anyone could provide suggestions, relevant examples or resources regarding a coding problem I'm having below.
So I have two data files with tab-delineated columns, similar to the example below.
File#1:
GeneID ColA ColB
Gene01 5 15
Gene02 4 8
Gene03 25 5
File#2:
GeneID ColA ColC
Gene01 12 3
Gene03 5 20
Gene05 22 40
Gene06 88 2
The actual files I'm using have >50 columns and rows, but are similar to what's above.
First, I want to input the files, establish variables holding the column names for each file, and establish hashes using the column 1 genes as keys and the concatenated values of the other 2 columns per key.
This way there is one key per one value in each row of the hash.
My trouble is the third hash %commongenes. I need to find the keys that are the same in both hashes and use just those keys, and their associated values in both files, in the third hash. In the above example, this would be the following key value pairs:
File1: File2:
Gene01 5 15 Gene01 12 3
Gene03 25 5 Gene03 5 20
I know the following if loop is incorrect, but the concatenation of columns from both files it what I'd like to have.
if ($tmpArray1[0] eq $tmpArray2[0]){
$commongenes{$tmpArray2[0]} =
$tmpArray1[1].':'.$tmpArray1[2].':'.$tmpArray2[1].':'.$tmpArray2[2];
}
Here is the main body of the code below:
#!/usr/bin/perl -w
use strict;
my $file1=$ARGV[0];
my $file2=$ARGV[1];
open (FILE1, "<$file1") or die "Cannot open $file1 for processing!\n";
open (FILE2, "<$file2") or die "Cannot opent $file2 for processing!\n";
my @fileLine1=<FILE1>;
my @fileLine2=<FILE2>;
my %file1_allgenes=();
my %file2_allgenes=();
my %commongenes =();
my ($file1_group0name, $file1_group1name, $file1_group2name)=('','','','');
my ($file2_group0name, $file2_group1name, $file2_group2name)=('','','','');
for (my $i=0; $i<=$#fileLine1 && $i<=$#fileLine2; $i++) {
chomp($fileLine1[$i]);
chomp($fileLine2[$i]);
my @tmpArray1=split('\t',$fileLine1[$i]);
my @tmpArray2=split('\t',$fileLine2[$i]);
if ($i==0) { ## Column Names and/or Letters
$file1_group0name=substr($tmpArray1[0],0,6);
$file1_group1name=substr($tmpArray1[1],0,4);
$file1_group2name=substr($tmpArray1[2],0,4);
$file2_group0name=substr($tmpArray2[0],0,6);
$file2_group1name=substr($tmpArray2[1],0,4);
$file2_group2name=substr($tmpArray2[2],0,4);
}
if ($i!=0) { ## Concatenated values in 3 separate hashes
if (! defined $file1_allgenes{$tmpArray1[0]}) {
$file1_allgenes{$tmpArray1[0]}=$tmpArray1[1].':'.$tmpArray1[2];
}
if (! defined $file2_allgenes{$tmpArray2[0]}) {
$file2_allgenes{$tmpArray2[0]}=$tmpArray2[1].':'.$tmpArray2[2];
}
if ($tmpArray1[0] eq $tmpArray2[0]){
$commongenes{$tmpArray2[0]} =
$tmpArray1[1].':'.$tmpArray1[2].':'.$tmpArray2[1].':'.$tmpArray2[2];
}
}
my @commongenes = %commongenes;
print "@commongenes\n\n";
}
Any suggestions are most appreciated.
Upvotes: 0
Views: 182
Reputation: 241868
Use a hash of arrays so you don't need to substr and concatenate the strings all the time.
#!/usr/bin/perl
use warnings;
use strict;
open my $F1, '<', 'file1' or die $!;
<$F1>; # Skip the header.
my %h;
while (<$F1>) {
my @cols = split;
$h{ $cols[0] } = [ @cols[ 1 .. $#cols ] ];
}
my %common;
open my $F2, '<', 'file2' or die $!;
<$F2>;
while (<$F2>) {
my @cols = split;
$common{ $cols[0] } = [ @{ $h{ $cols[0] } }, @cols[ 1 .. $#cols ] ]
if exists $h{ $cols[0] };
}
use Data::Dumper; print Dumper \%common;
Upvotes: 2