knytshade
knytshade

Reputation: 31

Perl Use of uninitialized value in addition when all values are initialized

I am attempting to write a script that calculates the order parameter for several carbon to hydrogen bonds and outputs those values. The math is trivial but I am getting a "Use of uninitialized value in addition" error when I attempt to average the values at the end. I am well aware of how common and easily fixed this error is but I have checked all values given, there is a value for all 9212 values ( I checked by printing each, putting this into an excel document and searching for empty cells). I am at a loss and I am not sure how to further debug.

My script takes an input file, goes line by line, takes the x,y,z coords if certain strings are present, does math on these coords (finds the angle between two vectors and the z-axis), should be averaging each $integer section together (so average of all 2's etc). It does this for 3 segments (2-8, 9-10, and 11-18), saves these to two arrays (@theta_values and @theta2_values) and finally it should average each "integer" together to find the average angle between the vector and the z-axis. In total there should be 34 values output, which does happen but each value has a "Use of uninitialized value in addition (+) at angle_checker_v3.pl line 334, line 34303." error, and all averages other than the first are too small.

For reference, line 334 is where I average and line 34303 is the last line of the file.

Some sample data would be:

ATOM   2199  C22 POPC    1      -9.427  11.863  11.706  1.00  0.00      MEMB
ATOM   2200  H2R POPC    1     -10.347  11.662  12.293  1.00  0.00      MEMB
ATOM   2201  H2S POPC    1      -8.968  10.895  11.443  1.00  0.00      MEMB
ATOM   2211  C23 POPC    1      -9.801  12.641  10.423  1.00  0.00      MEMB
ATOM   2212  H3R POPC    1     -10.136  13.667  10.696  1.00  0.00      MEMB
ATOM   2213  H3S POPC    1     -10.658  12.124   9.934  1.00  0.00      MEMB
ATOM   2214  C24 POPC    1      -8.663  12.751   9.396  1.00  0.00      MEMB
ATOM   2215  H4R POPC    1      -7.763  13.166   9.894  1.00  0.00      MEMB
ATOM   2216  H4S POPC    1      -8.961  13.479   8.607  1.00  0.00      MEMB

*I intentionally skipped 10 atoms that did not matter above

In order the columns denote: substance (not relevant), atom number, type of atom, residue number/molecule type, residue number, x-coord, y-coord, z-coord, alpha number (not relevant), beta column (not relevant), and overall molecule type.

TLDR; My averaging script:

#Averaging theta values
for (my $t=2; ($t <= 18); $t++) {
    for (my $j=1; ($j <= $lipid_num); $j++) {
            $sum[$t]= $theta_values[$t][$j] + $sum[$t];
    }
    $average[$t]= $sum[$t] / $lipid_num;
    print "Average theta for carbon $t is $average[$t]\n";
}

#Averaging Theta2 values
for (my $q=2; ($q <= 18); $q++) {
        for (my $b=1; ($b <= $lipid_num); $b++) {
                $sum2[$q]= $theta2_values[$q][$b] + $sum2[$q];
        }
        $average2[$q]= $sum2[$q] / $lipid_num;
        print "Average theta2 for carbon $q is $average2[$q]\n";
}

Does not find values at all positions even though I have verified that there are values at all positions.

This is the full script, I realize how large it is.

        #Usage:                                                                      #
# perl angle_checker.pl [granuphilin_prot-memb_system].pdb 
#!/usr/bin/perl 

use strict;
use warnings;
use Math::Trig;

my $inputfile = $ARGV[0];

open (INPUTFILE, "<", $inputfile) or die $!;

my @data = <INPUTFILE>;

#Quick Change Variables

my $lipid_num = 256;

#Library
my @sum;
my @average;
my @sum2;
my @average2;
my @x1;
my @y1;
my @z1;
my $R = 'R';
my $S = 'S';
my $one = '1';
my @theta_values;
my @theta2_values;
my @vectorCtoHR;
my @vectorCtoHS;
my @normal;

#Start for lipid count
for (my $lipid=1; ($lipid <= $lipid_num); $lipid++) {
    # First Carbon/Integer counter
    for (my $integer= 2; ($integer <= 8); $integer++) {
            #Split line 1
            for (my $line = 0; $line <= $#data; ++$line) {
                    #Search 1.1
                    if(($data[$line] =~ m/\s+C2$integer\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
                            chomp $data[$line];
                            my @splitline = (split /\s+/, $data[$line]);
                            foreach (@splitline) {
                                    $x1[0]= $splitline[5];
                                    $y1[0]= $splitline[6];
                                    $z1[0]= $splitline[7];
                            }
                    }
                    #Search 1.2
                        if(($data[$line] =~ m/\s+H$integer$R\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
                                my @splitline = (split /\s+/, $data[$line]);
                            foreach (@splitline) {
                                    $x1[1]= $splitline[5];
                                        $y1[1]= $splitline[6];
                                        $z1[1]= $splitline[7];
                            }
                    }
                    #Search 1.3
                        if(($data[$line] =~ m/\s+H$integer$S\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
                                my @splitline = (split /\s+/, $data[$line]);
                            foreach (@splitline) {
                                    $x1[2]= $splitline[5];
                                        $y1[2]= $splitline[6];
                                        $z1[2]= $splitline[7];
                            }
                    }
            }



    #Z-axis
    $normal[0]= 0;
    $normal[1]= 0;
    $normal[2]= 100;

    #Vector 1
    $vectorCtoHR[0]=($x1[0] - ($x1[1]));
    $vectorCtoHR[1]=($y1[0] - ($y1[1]));
    $vectorCtoHR[2]=($z1[0] - ($z1[1]));

    #Vector 2
    $vectorCtoHS[0]=($x1[0] - ($x1[2]));
        $vectorCtoHS[1]=($y1[0] - ($y1[2]));
        $vectorCtoHS[2]=($z1[0] - ($z1[2]));

    #First Angle

    my $x1mag = sqrt(($vectorCtoHS[0]**2)+($vectorCtoHS[1]**2)+($vectorCtoHS[2]**2));
    my $x2mag = sqrt(($normal[0]**2)+($normal[1]**2)+($normal[2]**2));

    #Dot product 
    my $dotproduct = (($vectorCtoHS[0]*$normal[0])+($vectorCtoHS[1]*$normal[1])+($vectorCtoHS[2]*$normal[2]));

    my $theta = acos($dotproduct/($x1mag*$x2mag));
    $theta_values[$integer][$lipid]= $theta;

    # Second Angle  
        my $x3mag = sqrt(($vectorCtoHR[0]**2)+($vectorCtoHR[1]**2)+($vectorCtoHR[2]**2));

        my $dotproduct2 = (($vectorCtoHR[0]*$normal[0])+($vectorCtoHR[1]*$normal[1])+($vectorCtoHR[2]*$normal[2]));

        my $theta2 = acos($dotproduct2/($x3mag*$x2mag));
        $theta2_values[$integer][$lipid]= $theta2;
    }
    #Section 2 Search These only have one hydrogen to search for, hence 1 less search
    for (my $integer = 9; ($integer <= 10); $integer++) {
                for (my $line = 0; $line <= $#data; ++$line) {
                        if(($data[$line] =~ m/\s+C2$integer\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
                                chomp $data[$line];
                                my @splitline = (split /\s+/, $data[$line]);
                                foreach (@splitline) {
                                        $x1[0]= $splitline[5];
                                        $y1[0]= $splitline[6];
                                        $z1[0]= $splitline[7];
                                }
                        }
                    if(($data[$line] =~ m/\s+H$integer$one\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
                                my @splitline = (split /\s+/, $data[$line]);
                                foreach (@splitline) {
                                        $x1[1]= $splitline[5];
                                        $y1[1]= $splitline[6];
                                        $z1[1]= $splitline[7];
                                }
                        }
                }
    $normal[0]= 0;
    $normal[1]= 0;
    $normal[2]= 100;
    $vectorCtoHR[0]=($x1[0] - ($x1[1]));
    $vectorCtoHR[1]=($y1[0] - ($y1[1]));
    $vectorCtoHR[2]=($z1[0] - ($z1[1]));

    my $x1mag = sqrt(($vectorCtoHR[0]**2)+($vectorCtoHR[1]**2)+($vectorCtoHR[2]**2));
    my $x2mag = sqrt(($normal[0]**2)+($normal[1]**2)+($normal[2]**2));

    #Dot product 
    my $dotproduct = (($vectorCtoHR[0]*$normal[0])+($vectorCtoHR[1]*$normal[1])+($vectorCtoHR[2]*$normal[2]));

    my $theta = acos($dotproduct/($x1mag*$x2mag));
    $theta_values[$integer][$lipid]= $theta;
    $theta2_values[$integer][$lipid]= $theta;
    }

    #Effectively the same as section 1
    for (my $integer= 11; ($integer <= 18); $integer++) {
            for (my $line = 0; $line <= $#data; ++$line) {
                    if(($data[$line] =~ m/\s+C2$integer\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
                            chomp $data[$line];
                            my @splitline = (split /\s+/, $data[$line]);
                            foreach (@splitline) {
                                    $x1[0]= $splitline[5];
                                    $y1[0]= $splitline[6];
                                    $z1[0]= $splitline[7];
                            }
                    }
                        if(($data[$line] =~ m/\s+H$integer$R\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
                                my @splitline = (split /\s+/, $data[$line]);
                            foreach (@splitline) {
                                    $x1[1]= $splitline[5];
                                        $y1[1]= $splitline[6];
                                        $z1[1]= $splitline[7];
                            }
                    }
                        if(($data[$line] =~ m/\s+H$integer$S\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
                                my @splitline = (split /\s+/, $data[$line]);
                            foreach (@splitline) {
                                    $x1[2]= $splitline[5];
                                        $y1[2]= $splitline[6];
                                        $z1[2]= $splitline[7];
                            }
                    }
            }
    $normal[0]= 0;
    $normal[1]= 0;
    $normal[2]= 100;

    $vectorCtoHR[0]=($x1[0] - ($x1[1]));
    $vectorCtoHR[1]=($y1[0] - ($y1[1]));
    $vectorCtoHR[2]=($z1[0] - ($z1[1]));

    $vectorCtoHS[0]=($x1[0] - ($x1[2]));
        $vectorCtoHS[1]=($y1[0] - ($y1[2]));
        $vectorCtoHS[2]=($z1[0] - ($z1[2]));

    #First Angle

    my $x1mag = sqrt(($vectorCtoHS[0]**2)+($vectorCtoHS[1]**2)+($vectorCtoHS[2]**2));
    my $x2mag = sqrt(($normal[0]**2)+($normal[1]**2)+($normal[2]**2));

    #Dot product 
    my $dotproduct = (($vectorCtoHS[0]*$normal[0])+($vectorCtoHS[1]*$normal[1])+($vectorCtoHS[2]*$normal[2]));

    my $theta = acos($dotproduct/($x1mag*$x2mag));
    $theta_values[$integer][$lipid]= $theta;
    }
print "done with $lipid\n";
#End of lipid search
}
#Averaging starts now

#Averaging theta values
for (my $t=2; ($t <= 18); $t++) {
    for (my $j=1; ($j <= $lipid_num); $j++) {
            $sum[$t]= $theta_values[$t][$j] + $sum[$t];
    }
    $average[$t]= $sum[$t] / $lipid_num;
    print "Average theta for carbon $t is $average[$t]\n";
}

#Averaging Theta2 values
for (my $q=2; ($q <= 18); $q++) {
        for (my $b=1; ($b <= $lipid_num); $b++) {
                $sum2[$q]= $theta2_values[$q][$b] + $sum2[$q];
        }
        $average2[$q]= $sum2[$q] / $lipid_num;
        print "Average theta2 for carbon $q is $average2[$q]\n";
}

Upvotes: 3

Views: 660

Answers (2)

knytshade
knytshade

Reputation: 31

Turns out I was adding my $sum[$t], which did not have a value, to something that did and this was giving the error. To fix this I changed from:

$sum[$t]= $theta_values[$t][$j] + $sum[$t];

To:

$sum[$t]+= $theta_values[$t][$j];

Thank you all for the help.

Upvotes: 0

Marty
Marty

Reputation: 2808

Without the input data it's impossible to reproduce the problem locally making it near impossible to help with debugging. Having looked at your code though, there a several things I can suggest that will simplify the code and hopefully make it easier to find the problem.

Firstly, nearly all of your loops iterate over an integer variable between two values in C-style for loop. That form of for is there if you absolutely need it but perl has far more expressive - and therefore far easier to read and understand the authors intent - forms of a for loop.

Where you simply need an integer range; eg

for (my $integer= 2; ($integer <= 8); $integer++) { 

you can simply state "I want $integer to go from 2 to 8";

for my $integer (2 .. 8)

Where you're using the integer solely for the purpose of indexing back into an array to get at the contents, you can simply tell perl you want to iterate over the array contents - ie instead of;

for (my $line = 0; $line <= $#data; ++$line) {
    if(($data[$line] =~ ... etc ...
    chomp $data[$line];

you can more simply;

for my $line (@data) {
    if(($line =~ ... etc ...
    chomp $line;

Secondly, where you've got several regexs in play, it helps to separate their definition from their use. It allows the reader to consume and understand the regex itself separately from (later) seeing how/why it being applied. Also, 'extended mode' regex allow whitespace in the regex definition. It's simply amazing how much easier it is to read regexes (regexen?) with whitespace - so much so, that one should consider making it a rule to simply just always use /x. Togeather, we can replace;

if(($data[$line] =~ m/\s+C2$integer\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {

with;

my $has_POPC        = qr/ \s+  POPC        \s+ /x;
my $has_lipid       = qr/ \s+  $lipid      \s+ /x;
my $has_C2_integer  = qr/ \s+  C2 $integer \s+ /x;

if( $line =~ $has_C2_integer  &&  $line =~ $has_lipid  &&  $line =~ $has_POPC) {

Thirdly - perhaps most imprtantly because I think its potentially a bug - within several of your inner loops you have;

my @splitline = (split /\s+/, $data[$line]);
foreach (@splitline) {
    $x1[0]= $splitline[5];
    $y1[0]= $splitline[6];
    $z1[0]= $splitline[7];
}

Again - I don't have the input data and therefore can't check - but this is almost certainly a mistake. You're splitting the line on whitespace - for the sake of discussion, let say it has 10 "pieces". You then iterate over those pieces (putting them into the default topic, $_) but make no reference to the topical piece itself - ie, you're not using $_. Therefore, the code is putting pieces 5, 6 & 7 into x1, y1 & z1 (respectively) - 10 times over. Now, it probably doesn't matter but as I said, its almost certainly not what you wanted and therefore is a bug waiting to happen. You may (its a matter of balance between terseness vs readabillity) want to consolidate the three assignments into list form and (again, optionally) eliminate the temporary variable, @splitline;

if( $line =~ $has_C2_integer  &&  $line =~ $has_lipid  &&  $line =~ $has_POPC) {
    ( $x1[0], $y1[0], $z1[0] ) = (split /\s+/ $line)[ 5, 6, 7 ];
}

Putting these ideas togeather enables replacing this;

#Start for lipid count
for (my $lipid=1; ($lipid <= $lipid_num); $lipid++) {
    # First Carbon/Integer counter
    for (my $integer= 2; ($integer <= 8); $integer++) {
        #Split line 1
        for (my $line = 0; $line <= $#data; ++$line) {
                #Search 1.1
                if(($data[$line] =~ m/\s+C2$integer\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
                        chomp $data[$line];
                        my @splitline = (split /\s+/, $data[$line]);
                        foreach (@splitline) {
                                $x1[0]= $splitline[5];
                                $y1[0]= $splitline[6];
                                $z1[0]= $splitline[7];
                        }
                }
                #Search 1.2
                    if(($data[$line] =~ m/\s+H$integer$R\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
                            my @splitline = (split /\s+/, $data[$line]);
                        foreach (@splitline) {
                                $x1[1]= $splitline[5];
                                    $y1[1]= $splitline[6];
                                    $z1[1]= $splitline[7];
                        }
                }
                #Search 1.3
                    if(($data[$line] =~ m/\s+H$integer$S\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
                            my @splitline = (split /\s+/, $data[$line]);
                        foreach (@splitline) {
                                $x1[2]= $splitline[5];
                                    $y1[2]= $splitline[6];
                                    $z1[2]= $splitline[7];
                        }
                }
        }

with ;

my $has_POPC = qr/ \s+ POPC \s+ /x;
#Start for lipid count
for my $lipid (1 .. $lipid_num) {
    my $has_lipid = qr/ \s+ $lipid \s+ /x;

    # First Carbon/Integer counter
    for my $integer (2 .. 8) {
        my $has_C2_integer  = qr/ \s+ C2 $integer   \s+ /x;
        my $has_H_integer_R = qr/ \s+ H  $integer R \s+ /x;
        my $has_H_integer_S = qr/ \s+ H  $integer S \s+ /x;

        #Split line 1
        for my $line (@data) {
            chomp $line;

            #Search 1.1
            if ($line =~ $has_C2_integer && $line =~ $has_lipid && $line =~ $has_POPC)  {
                ($x1[0], $y1[0], $z1[0]) = (split /\s+/ $line)[ 5, 6, 7 ];
            }

            #Search 1.2
            if ($line =~ $has_H_integer_R && $line =~ $has_lipid && $line =~ $has_POPC) {
                ($x1[1], $y1[1], $z1[1]) = (split /\s+/ $line)[ 5, 6, 7 ];
            }

            #Search 1.3
            if ($line =~ $has_H_integer_S && $line =~ $has_lipid && $line =~ $has_POPC) {
                ($x1[2], $y1[2], $z1[2]) = (split /\s+/ $line)[ 5, 6, 7 ];
            }
        }
    }
}

...that's a reduction in the number of lines by about a third and is (arguably) more readable. Later in the program the structure is repeated almost verbatim, so you should be able to obtain the same reduction again. This has to be rigorously checked of course, which I can't do. Finally, have a good look at the Perl debugger. It only takes about 15 minutes to get across the basics and it will repay you 10 times over (at least).

Upvotes: 3

Related Questions