Reputation: 1
I'm fairly inexperienced with coding, but I often use Perl to merge files and match ID's and information between two files. I have just tried matching two files using a program I have used many times previously, but this time it's not working and I don't understand why.
Here is the code:
use strict;
use warnings;
use vars qw($damID $damF $damAHC $prog $hash1 %hash1 $info1 $ID $sire $dam $F $FB $AHC $FA $hash2 %hash2 $info2);
open (FILE1, "<damF.txt") || die "$!\n Couldn't open damF.txt\n";
my $N = 1;
while (<FILE1>){
chomp (my $line=$_);
next if 1..$N==$.;
my ($damID, $damF, $damAHC, $prog) = split (/\t/, $line);
if ($prog){
$hash1 -> {$prog} -> {info1} = "$damID\t$damF\t$damAHC";
}
open (FILE2, "<whole pedigree_F.txt") || die "$!\n whole pedigree_F.txt \n";
open (Output, ">Output.txt")||die "Can't Open Output file";
while (<FILE2>){
chomp (my $line=$_);
next if 1..$N==$.;
my ($ID, $sire, $dam, $F, $FB, $AHC, $FA) = split (/\t/, $line);
if ($ID){
$hash2 -> {$ID} -> {info2} = "$F\t$AHC";
}
if ($ID && ($hash1->{$prog})){
$info1 = $hash1 -> {$prog} -> {info1};
$info2 = $hash2 -> {$ID} -> {info2};
print "$ID\t$info2\t$info1\n";
}
}
}
close(FILE1);
close FILE2;
close Output;
print "Done!\n";
and these snippets from the two input file formats:
File 1:
501093 0 0 3162
2958 0 0 3163
1895 0 0 3164
1382 0 0 3165
2869 0 0 3166
2361 0 0 3167
754 0 0 3168
3163 0 0 3169
File 2:
49327 20543 49325 0.077 0.4899 0.808 0.0484
49328 15247 49326 0.0755 0.5232 0.8972 0.0499
49329 27823 49327 0.0834 0.5138 0.8738 0.0541
I want to match the values from column 4 in file 1
, with column 1 in file 2
.
Then I also want to print the matching values from columns 2 and 3 in file 1
and columns 3 and 5 in file 2
.
Also, it is probably worth mentioning there are about 500000 entries on each file.
This is the output I am getting:
11476 0.0362 0.3237 501093 0 0
11477 0.0673 0.4768 501093 0 0
11478 0.0443 0.2619 501093 0 0
Note that it isn’t looping through the first hash that I created.
Upvotes: 0
Views: 86
Reputation: 164759
Create two tables in SQLite. Load the TSVs into them. Do a SQL join. It will be simpler and faster.
Refer to this answer about how to load data into SQLite. In your case you want .mode tabs
.
sqlite> create table file1 ( col1 int, col2 int, col3 int, col4 int );
sqlite> create table file2 ( col1 int, col2 int, col3 int, col4 numeric, col5 numeric, col6 numeric, col7 numeric );
sqlite> .mode tabs
sqlite> .import /path/to/file1 file1
sqlite> .import /path/to/file2 file2
There's any number of ways to improve those tables, but I don't know what your data is. Use better names in your own. You'll also want to declare things like primary and foreign keys as well as indexes to speed things up.
Now you have your data in an easy to manipulate format using a well known query language, not a bunch of custom code.
I want to match the values from column 4 in file 1, with column 1 in file 2.
Then I also want to print the matching values from columns 2 and 3 in file 1 and columns 3 and 5 in file 2.
You can do this with a SQL join between the two tables.
select file1.col2, file1.col3, file2.col3, file2.col5
from file1
join file2 on file1.col4 = file2.col1
Upvotes: 1