theo4786
theo4786

Reputation: 159

perl to print lines with matching fields from two files, error: use of uninitialized value in string

I have two files, which look like:

File1:

chr id position a0 a1
22 rs4820378:39869209:C:T 39869209 C T
22 22:16050075:A:G 16050075 A G
22 22:16050115:G:A 16050115 G A
22 rs199694733:39913976:C:CT 39913976 C CT
22 rs139408809:39937958:GC:G 39937958 GC G

File2:

SNP CHR BP A1 A2
rs4820378 22 39869209 C T
rs4821900 22 39869719 G A
rs1984662 22 39869997 T G
rs35629588 22 39913976 I2 D
rs139408809 22 39937958 D I2

I would like to find lines where

and also either

Then print out field 2 from File1, and fields 1 and 3 from File2

Code below

#! perl -w

use strict;
use warnings;

my %kglocus;
open( my $loci_in, "<", "File1" ) or die $!;
while ( <$loci_in> ) {

    next if m/chr/;

    my ( $CHR, $id, $BP, $A1, $A2 ) = split;
    my $reg = "${CHR}_$BP";
    $kglocus{$reg} = [ $CHR, $id, $BP, $A1, $A2 ];
}
close $loci_in;

my $filename = shift @ARGV;
open( my $input, "<", $filename ) or die $!;
while ( <$input> ) {
    next if m/SNP/;

    my ( $SNP, $CHR, $BP, $A1, $A2 ) = split;
    my $reg = "${CHR}_$BP";

    if ( $A1 eq $kglocus{$reg}->[3] and $A2 eq $kglocus{$reg}->[4] ) {

        print "$kglocus{$reg}->[1] $SNP $BP\n";
    }
    elsif ( ( length( $A1 ) > 1 && length( $kglocus{$reg}->[3] ) > 1 ) ||
            ( length( $A2 ) > 1 && length( $kglocus{$reg}->[4] ) > 1 ) ) {

        print "$kglocus{$reg}->[1] $SNP $BP\n";
    }
}

close( $input );

I'm getting the error below for all input lines:

Use of uninitialized value in string eq at find_ID.hash.chr22.pl line 23 
Use of uninitialized value in length at find_ID.hash.chr22.pl line 27

Can anyone point out the problem?

Upvotes: 0

Views: 74

Answers (1)

Borodin
Borodin

Reputation: 126722

The problem is that the existence of the hash element $kglocus{$reg} forms the first test, that "Fields 1 and 3 from File1 match fields 2 and 3 from File2". But you are treating it as if that test always passes, and simply using it to access elements of the File1 record

You need something like a next unless $kglocus{$reg} in there to make it work correctly. I would also prefer to see that value pulled out as a separate variable to avoid indexing the hash over and over again

Here's a solution that will work for you

use strict;
use warnings;
use v5.10.1;
use autodie;

my %kglocus;
{
    open my $in_fh, '<', 'File1';
    while ( <$in_fh> ) {
        next if /chr/;

        my ( $chr, $id, $bp, $a1, $a2 ) = split;
        my $key = "${chr}_$bp";
        $kglocus{$key} = [ $chr, $id, $bp, $a1, $a2 ];
    }
}

{
    my ( $filename ) = @ARGV;
    open my $in_fh, '<', $filename;
    while ( <$in_fh> ) {
        next if /SNP/;

        my ( $snp, $chr, $bp, $a1, $a2 ) = split;
        my $key = "${chr}_$bp";
        next unless my $item = $kglocus{$key};

        if ( $a1 eq $item->[3] and $a2 eq $item->[4]
                or length $a1 > 1 and length $item->[3] > 1
                or length $a2 > 1 and length $item->[4] > 1 ) {

            print "$item->[1] $snp $bp\n";
        }
    }
}

Upvotes: 4

Related Questions