Reputation: 31
I am attempting to write a script that calculates the order parameter for several carbon to hydrogen bonds and outputs those values. The math is trivial but I am getting a "Use of uninitialized value in addition" error when I attempt to average the values at the end. I am well aware of how common and easily fixed this error is but I have checked all values given, there is a value for all 9212 values ( I checked by printing each, putting this into an excel document and searching for empty cells). I am at a loss and I am not sure how to further debug.
My script takes an input file, goes line by line, takes the x,y,z coords if certain strings are present, does math on these coords (finds the angle between two vectors and the z-axis), should be averaging each $integer section together (so average of all 2's etc). It does this for 3 segments (2-8, 9-10, and 11-18), saves these to two arrays (@theta_values and @theta2_values) and finally it should average each "integer" together to find the average angle between the vector and the z-axis. In total there should be 34 values output, which does happen but each value has a "Use of uninitialized value in addition (+) at angle_checker_v3.pl line 334, line 34303." error, and all averages other than the first are too small.
For reference, line 334 is where I average and line 34303 is the last line of the file.
Some sample data would be:
ATOM 2199 C22 POPC 1 -9.427 11.863 11.706 1.00 0.00 MEMB
ATOM 2200 H2R POPC 1 -10.347 11.662 12.293 1.00 0.00 MEMB
ATOM 2201 H2S POPC 1 -8.968 10.895 11.443 1.00 0.00 MEMB
ATOM 2211 C23 POPC 1 -9.801 12.641 10.423 1.00 0.00 MEMB
ATOM 2212 H3R POPC 1 -10.136 13.667 10.696 1.00 0.00 MEMB
ATOM 2213 H3S POPC 1 -10.658 12.124 9.934 1.00 0.00 MEMB
ATOM 2214 C24 POPC 1 -8.663 12.751 9.396 1.00 0.00 MEMB
ATOM 2215 H4R POPC 1 -7.763 13.166 9.894 1.00 0.00 MEMB
ATOM 2216 H4S POPC 1 -8.961 13.479 8.607 1.00 0.00 MEMB
*I intentionally skipped 10 atoms that did not matter above
In order the columns denote: substance (not relevant), atom number, type of atom, residue number/molecule type, residue number, x-coord, y-coord, z-coord, alpha number (not relevant), beta column (not relevant), and overall molecule type.
TLDR; My averaging script:
#Averaging theta values
for (my $t=2; ($t <= 18); $t++) {
for (my $j=1; ($j <= $lipid_num); $j++) {
$sum[$t]= $theta_values[$t][$j] + $sum[$t];
}
$average[$t]= $sum[$t] / $lipid_num;
print "Average theta for carbon $t is $average[$t]\n";
}
#Averaging Theta2 values
for (my $q=2; ($q <= 18); $q++) {
for (my $b=1; ($b <= $lipid_num); $b++) {
$sum2[$q]= $theta2_values[$q][$b] + $sum2[$q];
}
$average2[$q]= $sum2[$q] / $lipid_num;
print "Average theta2 for carbon $q is $average2[$q]\n";
}
Does not find values at all positions even though I have verified that there are values at all positions.
This is the full script, I realize how large it is.
#Usage: #
# perl angle_checker.pl [granuphilin_prot-memb_system].pdb
#!/usr/bin/perl
use strict;
use warnings;
use Math::Trig;
my $inputfile = $ARGV[0];
open (INPUTFILE, "<", $inputfile) or die $!;
my @data = <INPUTFILE>;
#Quick Change Variables
my $lipid_num = 256;
#Library
my @sum;
my @average;
my @sum2;
my @average2;
my @x1;
my @y1;
my @z1;
my $R = 'R';
my $S = 'S';
my $one = '1';
my @theta_values;
my @theta2_values;
my @vectorCtoHR;
my @vectorCtoHS;
my @normal;
#Start for lipid count
for (my $lipid=1; ($lipid <= $lipid_num); $lipid++) {
# First Carbon/Integer counter
for (my $integer= 2; ($integer <= 8); $integer++) {
#Split line 1
for (my $line = 0; $line <= $#data; ++$line) {
#Search 1.1
if(($data[$line] =~ m/\s+C2$integer\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
chomp $data[$line];
my @splitline = (split /\s+/, $data[$line]);
foreach (@splitline) {
$x1[0]= $splitline[5];
$y1[0]= $splitline[6];
$z1[0]= $splitline[7];
}
}
#Search 1.2
if(($data[$line] =~ m/\s+H$integer$R\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
my @splitline = (split /\s+/, $data[$line]);
foreach (@splitline) {
$x1[1]= $splitline[5];
$y1[1]= $splitline[6];
$z1[1]= $splitline[7];
}
}
#Search 1.3
if(($data[$line] =~ m/\s+H$integer$S\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
my @splitline = (split /\s+/, $data[$line]);
foreach (@splitline) {
$x1[2]= $splitline[5];
$y1[2]= $splitline[6];
$z1[2]= $splitline[7];
}
}
}
#Z-axis
$normal[0]= 0;
$normal[1]= 0;
$normal[2]= 100;
#Vector 1
$vectorCtoHR[0]=($x1[0] - ($x1[1]));
$vectorCtoHR[1]=($y1[0] - ($y1[1]));
$vectorCtoHR[2]=($z1[0] - ($z1[1]));
#Vector 2
$vectorCtoHS[0]=($x1[0] - ($x1[2]));
$vectorCtoHS[1]=($y1[0] - ($y1[2]));
$vectorCtoHS[2]=($z1[0] - ($z1[2]));
#First Angle
my $x1mag = sqrt(($vectorCtoHS[0]**2)+($vectorCtoHS[1]**2)+($vectorCtoHS[2]**2));
my $x2mag = sqrt(($normal[0]**2)+($normal[1]**2)+($normal[2]**2));
#Dot product
my $dotproduct = (($vectorCtoHS[0]*$normal[0])+($vectorCtoHS[1]*$normal[1])+($vectorCtoHS[2]*$normal[2]));
my $theta = acos($dotproduct/($x1mag*$x2mag));
$theta_values[$integer][$lipid]= $theta;
# Second Angle
my $x3mag = sqrt(($vectorCtoHR[0]**2)+($vectorCtoHR[1]**2)+($vectorCtoHR[2]**2));
my $dotproduct2 = (($vectorCtoHR[0]*$normal[0])+($vectorCtoHR[1]*$normal[1])+($vectorCtoHR[2]*$normal[2]));
my $theta2 = acos($dotproduct2/($x3mag*$x2mag));
$theta2_values[$integer][$lipid]= $theta2;
}
#Section 2 Search These only have one hydrogen to search for, hence 1 less search
for (my $integer = 9; ($integer <= 10); $integer++) {
for (my $line = 0; $line <= $#data; ++$line) {
if(($data[$line] =~ m/\s+C2$integer\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
chomp $data[$line];
my @splitline = (split /\s+/, $data[$line]);
foreach (@splitline) {
$x1[0]= $splitline[5];
$y1[0]= $splitline[6];
$z1[0]= $splitline[7];
}
}
if(($data[$line] =~ m/\s+H$integer$one\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
my @splitline = (split /\s+/, $data[$line]);
foreach (@splitline) {
$x1[1]= $splitline[5];
$y1[1]= $splitline[6];
$z1[1]= $splitline[7];
}
}
}
$normal[0]= 0;
$normal[1]= 0;
$normal[2]= 100;
$vectorCtoHR[0]=($x1[0] - ($x1[1]));
$vectorCtoHR[1]=($y1[0] - ($y1[1]));
$vectorCtoHR[2]=($z1[0] - ($z1[1]));
my $x1mag = sqrt(($vectorCtoHR[0]**2)+($vectorCtoHR[1]**2)+($vectorCtoHR[2]**2));
my $x2mag = sqrt(($normal[0]**2)+($normal[1]**2)+($normal[2]**2));
#Dot product
my $dotproduct = (($vectorCtoHR[0]*$normal[0])+($vectorCtoHR[1]*$normal[1])+($vectorCtoHR[2]*$normal[2]));
my $theta = acos($dotproduct/($x1mag*$x2mag));
$theta_values[$integer][$lipid]= $theta;
$theta2_values[$integer][$lipid]= $theta;
}
#Effectively the same as section 1
for (my $integer= 11; ($integer <= 18); $integer++) {
for (my $line = 0; $line <= $#data; ++$line) {
if(($data[$line] =~ m/\s+C2$integer\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
chomp $data[$line];
my @splitline = (split /\s+/, $data[$line]);
foreach (@splitline) {
$x1[0]= $splitline[5];
$y1[0]= $splitline[6];
$z1[0]= $splitline[7];
}
}
if(($data[$line] =~ m/\s+H$integer$R\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
my @splitline = (split /\s+/, $data[$line]);
foreach (@splitline) {
$x1[1]= $splitline[5];
$y1[1]= $splitline[6];
$z1[1]= $splitline[7];
}
}
if(($data[$line] =~ m/\s+H$integer$S\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
my @splitline = (split /\s+/, $data[$line]);
foreach (@splitline) {
$x1[2]= $splitline[5];
$y1[2]= $splitline[6];
$z1[2]= $splitline[7];
}
}
}
$normal[0]= 0;
$normal[1]= 0;
$normal[2]= 100;
$vectorCtoHR[0]=($x1[0] - ($x1[1]));
$vectorCtoHR[1]=($y1[0] - ($y1[1]));
$vectorCtoHR[2]=($z1[0] - ($z1[1]));
$vectorCtoHS[0]=($x1[0] - ($x1[2]));
$vectorCtoHS[1]=($y1[0] - ($y1[2]));
$vectorCtoHS[2]=($z1[0] - ($z1[2]));
#First Angle
my $x1mag = sqrt(($vectorCtoHS[0]**2)+($vectorCtoHS[1]**2)+($vectorCtoHS[2]**2));
my $x2mag = sqrt(($normal[0]**2)+($normal[1]**2)+($normal[2]**2));
#Dot product
my $dotproduct = (($vectorCtoHS[0]*$normal[0])+($vectorCtoHS[1]*$normal[1])+($vectorCtoHS[2]*$normal[2]));
my $theta = acos($dotproduct/($x1mag*$x2mag));
$theta_values[$integer][$lipid]= $theta;
}
print "done with $lipid\n";
#End of lipid search
}
#Averaging starts now
#Averaging theta values
for (my $t=2; ($t <= 18); $t++) {
for (my $j=1; ($j <= $lipid_num); $j++) {
$sum[$t]= $theta_values[$t][$j] + $sum[$t];
}
$average[$t]= $sum[$t] / $lipid_num;
print "Average theta for carbon $t is $average[$t]\n";
}
#Averaging Theta2 values
for (my $q=2; ($q <= 18); $q++) {
for (my $b=1; ($b <= $lipid_num); $b++) {
$sum2[$q]= $theta2_values[$q][$b] + $sum2[$q];
}
$average2[$q]= $sum2[$q] / $lipid_num;
print "Average theta2 for carbon $q is $average2[$q]\n";
}
Upvotes: 3
Views: 660
Reputation: 31
Turns out I was adding my $sum[$t], which did not have a value, to something that did and this was giving the error. To fix this I changed from:
$sum[$t]= $theta_values[$t][$j] + $sum[$t];
To:
$sum[$t]+= $theta_values[$t][$j];
Thank you all for the help.
Upvotes: 0
Reputation: 2808
Without the input data it's impossible to reproduce the problem locally making it near impossible to help with debugging. Having looked at your code though, there a several things I can suggest that will simplify the code and hopefully make it easier to find the problem.
Firstly, nearly all of your loops iterate over an integer variable between two values in C-style for
loop. That form of for
is there if you absolutely need it but perl has far more expressive - and therefore far easier to read and understand the authors intent - forms of a for loop.
Where you simply need an integer range; eg
for (my $integer= 2; ($integer <= 8); $integer++) {
you can simply state "I want $integer to go from 2 to 8";
for my $integer (2 .. 8)
Where you're using the integer solely for the purpose of indexing back into an array to get at the contents, you can simply tell perl you want to iterate over the array contents - ie instead of;
for (my $line = 0; $line <= $#data; ++$line) {
if(($data[$line] =~ ... etc ...
chomp $data[$line];
you can more simply;
for my $line (@data) {
if(($line =~ ... etc ...
chomp $line;
Secondly, where you've got several regexs in play, it helps to separate their definition from their use. It allows the reader to consume and understand the regex itself separately from (later) seeing how/why it being applied. Also, 'extended mode' regex allow whitespace in the regex definition. It's simply amazing how much easier it is to read regexes (regexen?) with whitespace - so much so, that one should consider making it a rule to simply just always use /x
. Togeather, we can replace;
if(($data[$line] =~ m/\s+C2$integer\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
with;
my $has_POPC = qr/ \s+ POPC \s+ /x;
my $has_lipid = qr/ \s+ $lipid \s+ /x;
my $has_C2_integer = qr/ \s+ C2 $integer \s+ /x;
if( $line =~ $has_C2_integer && $line =~ $has_lipid && $line =~ $has_POPC) {
Thirdly - perhaps most imprtantly because I think its potentially a bug - within several of your inner loops you have;
my @splitline = (split /\s+/, $data[$line]);
foreach (@splitline) {
$x1[0]= $splitline[5];
$y1[0]= $splitline[6];
$z1[0]= $splitline[7];
}
Again - I don't have the input data and therefore can't check - but this is almost certainly a mistake. You're splitting the line on whitespace - for the sake of discussion, let say it has 10 "pieces". You then iterate over those pieces (putting them into the default topic, $_
) but make no reference to the topical piece itself - ie, you're not using $_
. Therefore, the code is putting pieces 5, 6 & 7 into x1, y1 & z1 (respectively) - 10 times over. Now, it probably doesn't matter but as I said, its almost certainly not what you wanted and therefore is a bug waiting to happen. You may (its a matter of balance between terseness vs readabillity) want to consolidate the three assignments into list form and (again, optionally) eliminate the temporary variable, @splitline
;
if( $line =~ $has_C2_integer && $line =~ $has_lipid && $line =~ $has_POPC) {
( $x1[0], $y1[0], $z1[0] ) = (split /\s+/ $line)[ 5, 6, 7 ];
}
Putting these ideas togeather enables replacing this;
#Start for lipid count
for (my $lipid=1; ($lipid <= $lipid_num); $lipid++) {
# First Carbon/Integer counter
for (my $integer= 2; ($integer <= 8); $integer++) {
#Split line 1
for (my $line = 0; $line <= $#data; ++$line) {
#Search 1.1
if(($data[$line] =~ m/\s+C2$integer\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
chomp $data[$line];
my @splitline = (split /\s+/, $data[$line]);
foreach (@splitline) {
$x1[0]= $splitline[5];
$y1[0]= $splitline[6];
$z1[0]= $splitline[7];
}
}
#Search 1.2
if(($data[$line] =~ m/\s+H$integer$R\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
my @splitline = (split /\s+/, $data[$line]);
foreach (@splitline) {
$x1[1]= $splitline[5];
$y1[1]= $splitline[6];
$z1[1]= $splitline[7];
}
}
#Search 1.3
if(($data[$line] =~ m/\s+H$integer$S\s+/)&&($data[$line] =~ m/\s+$lipid\s+/)&&($data[$line] =~ m/\s+POPC\s+/)) {
my @splitline = (split /\s+/, $data[$line]);
foreach (@splitline) {
$x1[2]= $splitline[5];
$y1[2]= $splitline[6];
$z1[2]= $splitline[7];
}
}
}
with ;
my $has_POPC = qr/ \s+ POPC \s+ /x;
#Start for lipid count
for my $lipid (1 .. $lipid_num) {
my $has_lipid = qr/ \s+ $lipid \s+ /x;
# First Carbon/Integer counter
for my $integer (2 .. 8) {
my $has_C2_integer = qr/ \s+ C2 $integer \s+ /x;
my $has_H_integer_R = qr/ \s+ H $integer R \s+ /x;
my $has_H_integer_S = qr/ \s+ H $integer S \s+ /x;
#Split line 1
for my $line (@data) {
chomp $line;
#Search 1.1
if ($line =~ $has_C2_integer && $line =~ $has_lipid && $line =~ $has_POPC) {
($x1[0], $y1[0], $z1[0]) = (split /\s+/ $line)[ 5, 6, 7 ];
}
#Search 1.2
if ($line =~ $has_H_integer_R && $line =~ $has_lipid && $line =~ $has_POPC) {
($x1[1], $y1[1], $z1[1]) = (split /\s+/ $line)[ 5, 6, 7 ];
}
#Search 1.3
if ($line =~ $has_H_integer_S && $line =~ $has_lipid && $line =~ $has_POPC) {
($x1[2], $y1[2], $z1[2]) = (split /\s+/ $line)[ 5, 6, 7 ];
}
}
}
}
...that's a reduction in the number of lines by about a third and is (arguably) more readable. Later in the program the structure is repeated almost verbatim, so you should be able to obtain the same reduction again. This has to be rigorously checked of course, which I can't do. Finally, have a good look at the Perl debugger. It only takes about 15 minutes to get across the basics and it will repay you 10 times over (at least).
Upvotes: 3