Reputation: 7045
I have written a script in Perl that merges files based on a mapping in a third file; the reason I am not using join
is because lines won't always match. The code works, but gives an error that doesn't appear to affect output: Use of uninitialized value in join or string at join.pl line 43, <$fh> line 21.
As I am relatively new to Perl I have been unable to understand what is causing this error. Any help resolving this error or advice about my code would be greatly appreciated. I have provided example input and output below.
join.pl
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
use Tie::File;
use Scalar::Util qw(looks_like_number);
chomp( my $infile = $ARGV[0] );
chomp( my $infile1 = $ARGV[1] );
chomp( my $infile2 = $ARGV[2] );
chomp( my $outfile = $ARGV[3] );
open my $mapfile, '<', $infile or die "Could not open $infile: $!";
open my $file1, '<', $infile1 or die "Could not open $infile1: $!";
open my $file2, '<', $infile2 or die "Could not open $infile2: $!";
tie my @tieFile1, 'Tie::File', $infile1 or die "Could not open $infile1: $!";
tie my @tieFile2, 'Tie::File', $infile2 or die "Could not open $infile2: $!";
open my $output, '>', $outfile or die "Could not open $outfile: $!";
my %map1;
my %map2;
# This loop will read two input files and populate two hashes
# using the coordinates (field 2) and the current line number
while ( my $line1 = <$file1>, my $line2 = <$file2> ) {
my @row1 = split( "\t", $line1 );
my @row2 = split( "\t", $line2 );
# $. holds the line number
$map1{$row1[1]} = $.;
$map2{$row2[1]} = $.;
}
close($file1);
close($file2);
while ( my $line = <$mapfile> ) {
chomp $line;
my @row = split( "\t", $line );
my $species1 = $row[1];
my $reference1 = $map1{$species1};
my $species2 = $row[3];
my $reference2 = $map2{$species2};
my @nomatch = ("NA", "", "NA", "", "", "", "", "NA", "NA");
# test numeric
if ( looks_like_number($reference1) && looks_like_number($reference2) ) {
# do the do using the maps
print $output join("\t", $tieFile1[$reference1], $tieFile2[$reference2]), "\n";
}
elsif ( looks_like_number($reference1) )
{
print $output join("\t", $tieFile1[$reference1], @nomatch), "\n";
}
elsif ( looks_like_number($reference2) )
{
print $output join("\t", @nomatch, $tieFile2[$reference2]), "\n";
}
}
close($output);
untie @tieFile1;
untie @tieFile2;
input_1:
Scf_3L 12798910 T 0 41 0 0 NA NA
Scf_3L 12798911 C 0 0 43 0 NA NA
Scf_3L 12798912 A 42 0 0 0 NA NA
Scf_3L 12798913 G 0 0 0 44 NA NA
Scf_3L 12798914 T 0 42 0 0 NA NA
Scf_3L 12798915 G 0 0 0 44 NA NA
Scf_3L 12798916 T 0 42 0 0 NA NA
Scf_3L 12798917 A 41 0 0 0 NA NA
Scf_3L 12798918 G 0 0 0 43 NA NA
Scf_3L 12798919 T 0 43 0 0 NA NA
Scf_3L 12798920 T 0 41 0 0 NA NA
input_2:
3L 12559896 T 0 31 0 0 NA NA
3L 12559897 C 0 0 33 0 NA NA
3L 12559898 A 34 0 0 0 NA NA
3L 12559899 G 0 0 0 33 NA NA
3L 12559900 T 0 34 0 0 NA NA
3L 12559901 G 0 0 0 33 NA NA
3L 12559902 T 0 33 0 0 NA NA
3L 12559903 A 33 0 0 0 NA NA
3L 12559904 G 0 0 0 33 NA NA
3L 12559905 T 0 34 0 0 NA NA
3L 12559906 T 0 33 0 0 NA NA
map:
3L 12798910 T 12559896 T
3L 12798911 C 12559897 C
3L 12798912 A 12559898 A
3L 12798913 G 12559899 G
3L 12798914 T 12559900 T
3L 12798915 G 12559901 G
3L 12798916 T 12559902 T
3L 12798917 A 12559903 A
3L 12798918 G 12559904 G
3L 12798919 T 12559905 T
3L 12798920 T 12559906 T
output:
Scf_3L 12798910 T 0 41 0 0 NA NA 3L 12559896 T 0 31 0 0 NA NA
Scf_3L 12798911 C 0 0 43 0 NA NA 3L 12559897 C 0 0 33 0 NA NA
Scf_3L 12798912 A 42 0 0 0 NA NA 3L 12559898 A 34 0 0 0 NA NA
Scf_3L 12798913 G 0 0 0 44 NA NA 3L 12559899 G 0 0 0 33 NA NA
Scf_3L 12798914 T 0 42 0 0 NA NA 3L 12559900 T 0 34 0 0 NA NA
Scf_3L 12798915 G 0 0 0 44 NA NA 3L 12559901 G 0 0 0 33 NA NA
Scf_3L 12798916 T 0 42 0 0 NA NA 3L 12559902 T 0 33 0 0 NA NA
Scf_3L 12798917 A 41 0 0 0 NA NA 3L 12559903 A 33 0 0 0 NA NA
Scf_3L 12798918 G 0 0 0 43 NA NA 3L 12559904 G 0 0 0 33 NA NA
Scf_3L 12798919 T 0 43 0 0 NA NA 3L 12559905 T 0 34 0 0 NA NA
Scf_3L 12798920 T 0 41 0 0 NA NA 3L 12559906 T 0 33 0 0 NA NA
Upvotes: 0
Views: 67
Reputation: 126722
The immediate problem is that the indices of the tied arrays start at zero, while the line numbers in $.
start at 1. That means you need to subtract one from $.
or from the $reference
variables before using them as indices. So your resulting data was never correct in the first place, and you may have overlooked that if it weren't for the warning!
I fixed that and also tidied up your code a little. I mostly added use autodie
so that there's no need to check the status of IO operations (except for Tie::File
), changed to list assignments, moved the code to read the files into a subroutine, and added code blocks so that the lexical file handles would be closed automatically
I also used the tied arrays to build the %map
hashes instead of opening the files separately, which means their values are already zero-based as they need to be
Oh, and I removed looks_like_number
, because the $reference
variables must be either numeric or undef
because that's all we put into the hash. The correct way to check that a value isn't undef
is with the defined
operator
#!/usr/bin/perl
use strict;
use warnings 'all';
use autodie;
use Fcntl 'O_RDONLY';
use Tie::File;
my ( $mapfile, $infile1, $infile2, $outfile ) = @ARGV;
{
tie my @file1, 'Tie::File' => $infile1, mode => O_RDONLY
or die "Could not open $infile1: $!";
tie my @file2, 'Tie::File' =>$infile2, mode => O_RDONLY
or die "Could not open $infile2: $!";
my %map1 = map { (split /\t/, $file1[$_], 3)[1] => $_ } 0 .. $#file1;
my %map2 = map { (split /\t/, $file2[$_], 3)[1] => $_ } 0 .. $#file2;
open my $map_fh, '<', $mapfile;
open my $out_fh, '>', $outfile;
while ( <$map_fh> ) {
chomp;
my @row = split /\t/;
my ( $species1, $species2 ) = @row[1,3];
my $reference1 = $map1{$species1};
my $reference2 = $map2{$species2};
my @nomatch = ( "NA", "", "NA", "", "", "", "", "NA", "NA" );
my @fields = (
( defined $reference1 ? $file1[$reference1] : @nomatch),
( defined $reference2 ? $file2[$reference2] : @nomatch),
);
print $out_fh join( "\t", @fields ), "\n";
}
}
Scf_3L 12798910 T 0 41 0 0 NA NA NA NA NA NA
Scf_3L 12798911 C 0 0 43 0 NA NA NA NA NA NA
Scf_3L 12798912 A 42 0 0 0 NA NA NA NA NA NA
Scf_3L 12798913 G 0 0 0 44 NA NA NA NA NA NA
Scf_3L 12798914 T 0 42 0 0 NA NA NA NA NA NA
Scf_3L 12798915 G 0 0 0 44 NA NA NA NA NA NA
Scf_3L 12798916 T 0 42 0 0 NA NA NA NA NA NA
Scf_3L 12798917 A 41 0 0 0 NA NA NA NA NA NA
Scf_3L 12798918 G 0 0 0 43 NA NA NA NA NA NA
Scf_3L 12798919 T 0 43 0 0 NA NA NA NA NA NA
Scf_3L 12798920 T 0 41 0 0 NA NA NA NA NA NA
Upvotes: 2