Reputation: 51
I am trying to calculate distance between each coordinate of protein atom (ATOM
) and ligand atom (HETATM
). I have number of files that look like this:
ATOM 1592 HD13 LEU D 46 11.698 -10.914 2.183 1.00 0.00 H
ATOM 1593 HD21 LEU D 46 11.528 -8.800 5.301 1.00 0.00 H
ATOM 1594 HD22 LEU D 46 12.997 -9.452 4.535 1.00 0.00 H
ATOM 1595 HD23 LEU D 46 11.722 -8.718 3.534 1.00 0.00 H
HETATM 1597 N1 308 A 1 0.339 6.314 -9.091 1.00 0.00 N
HETATM 1598 C10 308 A 1 -0.195 5.226 -8.241 1.00 0.00 C
HETATM 1599 C7 308 A 1 -0.991 4.254 -9.133 1.00 0.00 C
HETATM 1600 C1 308 A 1 -1.468 3.053 -8.292 1.00 0.00 C
So I am trying to calculate distances between ATOM1
and all other HETATM1
, between ATOM1
and all other 'HETATM2' and so on. I have written a script in perl, but I cannot figure it out what is wrong with the script, it doesnt give me any error it just does not print anything.
I am also not sure how to add it in the script and if it is possible, that if the result of each calculation is more then 5
then delete this both lines that were included into calculation. If it is <=
then 5
then keep it.
#!/usr/local/bin/perl
use strict;
use warnings;
open(IN, $ARGV[0]) or die "$!";
my (@refer, @points);
my $part = 0;
my $dist;
while (my $line = <IN>) {
chomp($line);
if ($line =~ /^HETATM/) {
$part++;
next;
}
my @array = (substr($line, 30, 8),substr($line,38,8),substr($line,46,8));
# print "@array\n";
if ($part == 0) {
push @refer, [ @array ];
} elsif ($part ==1){
push @points, [ @array ];
}
}
foreach my $ref(@refer) {
my ($x1, $y1, $z1) = @{$ref};
foreach my $atom(@points) {
my ($x, $y, $z) = @{$atom};
my $dist = sqrt( ($x-$x1)**2 + ($y-$y1)**2 + ($z-$z1)**2 );
print $dist;
}
}
Upvotes: 1
Views: 397
Reputation: 53498
OK, I have to say - I'd rewrite your code, to work a bit differently.
Something like this:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my %coordinates;
#use types to track different types. Unclear if you need to handle anything other than 'not ATOM' but this is in case you do.
my %types;
#read STDIN or files specified on command line - like how grep/sed do it.
while ( <> ) {
my ( $type, $id, undef, undef, undef, undef, $x, $y, $z ) = split; # splits on white space.
$coordinates{$type}{$id} = [$x, $y, $z];
$types{$type}++ if $type ne 'ATOM';
}
#print for debugging:
print Dumper \%coordinates;
print Dumper \%types;
#iterate each element of "ATOM"
foreach my $atom_id ( keys %{$coordinates{'ATOM'}} ) {
#iterate all the types (HETATM)
foreach my $type ( sort keys %types ) {
#iterate each id within the data structure.
foreach my $id ( sort keys %{$coordinates{$type}} ) {
my $dist = 0;
#take square of x - x1, y - y1, z - z1
#do it iteratively, using 'for' loop.
$dist += (($coordinates{$type}{$id}[$_] - $coordinates{'ATOM'}{$atom_id}[$_]) ** 2) for 0..2;
$dist = sqrt $dist;
print "$atom_id -> $type $id $dist\n";
}
This is:
<>
to read STDIN or named files on command line instead of manually opening ARGV[0]
which accomplishes a similar result. (but means you can pipe stuff through it too). This gives as results:
1592 -> HETATM 1597 23.5145474334506
1592 -> HETATM 1598 22.5965224094328
1592 -> HETATM 1599 22.7844420822631
1592 -> HETATM 1600 21.8665559702483
1595 -> HETATM 1597 22.6919443415499
1595 -> HETATM 1598 21.7968036647578
1595 -> HETATM 1599 22.1437585337268
1595 -> HETATM 1600 21.2693868505888
1594 -> HETATM 1597 24.3815421169376
1594 -> HETATM 1598 23.509545380547
1594 -> HETATM 1599 23.8816415683679
1594 -> HETATM 1600 23.0248383056212
1593 -> HETATM 1597 23.6802952050856
1593 -> HETATM 1598 22.74957513889
1593 -> HETATM 1599 23.1402816102138
1593 -> HETATM 1600 22.2296935201545
Now you mention wanting to delete lines that are 'too far' - that's a bit complicated, because you've a compound criteria (and you'll delete all your lines).
The problem is - you don't know if your ATOM
lines have too much "distance" until you've tested every single pairing in the file.
You could perhaps do this by:
#iterate each element of "ATOM"
foreach my $atom_id ( keys %{$coordinates{'ATOM'}} ) {
#iterate all the types (HETATM)
foreach my $type ( sort keys %types ) {
#iterate each id within the data structure.
foreach my $id ( sort keys %{$coordinates{$type}} ) {
my $dist = 0;
#take square of x - x1, y - y1, z - z1
#do it iteratively, using 'for' loop.
$dist += (($coordinates{$type}{$id}[$_] - $coordinates{'ATOM'}{$atom_id}[$_]) ** 2) for 0..2;
$dist = sqrt $dist;
print "### $atom_id -> $type $id $dist\n";
##note - this will print out multiple times if there's multiple pairings.
if ( $dist <= 5 ) {
print $lines{'ATOM'}{$atom_id};
print $lines{$type}{$id};
}
}
}
}
Which will - for each pairing-comparison print both the ATOM
and HETATM
lines that had a distance of <= 5. But it will do so multiple times if multiple pairings exist.
But I think your core error is in mishandling the $part
and next
clauses.
$part
and whilst you initialise it at 0
, you never reset it to zero. So it'll be 1,2,3,4 for each successive HETATM
. next
after incrementing part
which means you skip the if ( $part == 1
clause entirely. Upvotes: 1
Reputation: 26131
I would use this approach:
#!/usr/bin/env perl
use strict;
use warnings;
use constant POZITION => ( 6, 7, 8 ); # X, Y, Z
sub dist {
my ( $a, $b ) = @_;
my $s = 0;
for my $i ( 0 .. $#$a ) {
$s += ( $a->[$i] - $b->[$i] )**2;
}
return sqrt($s);
}
# Record format
use constant {
LINE => 0,
POZ => 1,
KEEP => 2,
};
my ( @refer, @points );
while ( my $line = <> ) {
my ( $type, @poz ) = ( split ' ', $line )[ 0, POZITION ];
print STDERR join( ',', $type, @poz ), "\n";
if ( $type eq 'ATOM' ) {
push @refer, [ $line, \@poz ];
}
elsif ( $type eq 'HETATM' ) {
push @points, [ $line, \@poz ];
}
}
for my $ref (@refer) {
for my $atom (@points) {
my $dist = dist( $ref->[POZ], $atom->[POZ] );
print STDERR "$ref->[LINE]$atom->[LINE]dist: $dist\n";
next if $dist > 5;
$ref->[KEEP] ||= 1;
$atom->[KEEP] ||= 1;
}
}
print $_->[LINE] for grep $_->[KEEP], @refer, @points;
Unfortunately, your data doesn't contain any ATOM
and HETATM
pair with distance <= 5. (Note that split ' '
is word split. It means split /\s+/
with omitting any leading and trailing whitespaces.)
It works as a filter with debugging output to the STDERR.
Upvotes: -1
Reputation: 13666
When seeing a line with HETATM
you increment $part
and skip to the next input line. Your array @refer
will therefor be empty.
Remove the line with next;
after incrementing $part
.
And your test should be } elsif( $part ) { ... }
since you increment $part
for each line of HETATM
.
Upvotes: 2