Reputation: 51

Calculating distances between protein and ligand

I am trying to calculate distance between each coordinate of protein atom (ATOM) and ligand atom (HETATM). I have number of files that look like this:

ATOM   1592 HD13 LEU D  46      11.698 -10.914   2.183  1.00  0.00           H  
ATOM   1593 HD21 LEU D  46      11.528  -8.800   5.301  1.00  0.00           H  
ATOM   1594 HD22 LEU D  46      12.997  -9.452   4.535  1.00  0.00           H  
ATOM   1595 HD23 LEU D  46      11.722  -8.718   3.534  1.00  0.00           H  
HETATM 1597  N1  308 A   1       0.339   6.314  -9.091  1.00  0.00           N  
HETATM 1598  C10 308 A   1      -0.195   5.226  -8.241  1.00  0.00           C  
HETATM 1599  C7  308 A   1      -0.991   4.254  -9.133  1.00  0.00           C  
HETATM 1600  C1  308 A   1      -1.468   3.053  -8.292  1.00  0.00           C

So I am trying to calculate distances between ATOM1 and all other HETATM1, between ATOM1 and all other 'HETATM2' and so on. I have written a script in perl, but I cannot figure it out what is wrong with the script, it doesnt give me any error it just does not print anything.

I am also not sure how to add it in the script and if it is possible, that if the result of each calculation is more then 5 then delete this both lines that were included into calculation. If it is <= then 5 then keep it.

#!/usr/local/bin/perl 

    use strict;
    use warnings;

    open(IN, $ARGV[0]) or die "$!"; 
    my (@refer, @points);
    my $part = 0;
    my $dist;
    while (my $line = <IN>) { 
        chomp($line);
        if ($line =~ /^HETATM/) {
            $part++;
            next;
        }
        my @array = (substr($line, 30, 8),substr($line,38,8),substr($line,46,8));
    #    print "@array\n";
        if ($part == 0) {
            push @refer, [ @array ]; 
        } elsif ($part ==1){
            push @points, [ @array ]; 
        }
    }

        foreach my $ref(@refer) {
        my ($x1, $y1, $z1) = @{$ref};
        foreach my $atom(@points) {
            my ($x, $y, $z) = @{$atom};
            my $dist = sqrt( ($x-$x1)**2 + ($y-$y1)**2 + ($z-$z1)**2 );
        print $dist;

        }

    }

Upvotes: 1

Answers (3)

Sobrique

Reputation: 53498

OK, I have to say - I'd rewrite your code, to work a bit differently.

Something like this:

#!/usr/bin/env perl
use strict;
use warnings;

use Data::Dumper;

my %coordinates; 
#use types to track different types. Unclear if you need to handle anything other than 'not ATOM' but this is in case you do. 

my %types; 

#read STDIN or files specified on command line - like how grep/sed do it. 
while ( <> ) {
   my ( $type, $id, undef, undef, undef, undef, $x, $y, $z ) = split; # splits on white space. 
   $coordinates{$type}{$id} = [$x, $y, $z];
   $types{$type}++ if $type ne 'ATOM'; 
}

#print for debugging:
print Dumper \%coordinates;
print Dumper \%types;

#iterate each element of "ATOM"
foreach my $atom_id ( keys %{$coordinates{'ATOM'}} ) { 
   #iterate all the types (HETATM)
   foreach my $type ( sort keys %types ) { 
      #iterate each id within the data structure. 
      foreach my $id ( sort keys %{$coordinates{$type}} ) { 

         my $dist = 0;
         #take square of x - x1, y - y1, z - z1
         #do it iteratively, using 'for' loop.
         $dist += (($coordinates{$type}{$id}[$_] - $coordinates{'ATOM'}{$atom_id}[$_]) ** 2) for 0..2; 
         $dist = sqrt $dist; 

         print "$atom_id -> $type $id $dist\n";
      }

This is:

Using <> to read STDIN or named files on command line instead of manually opening ARGV[0] which accomplishes a similar result. (but means you can pipe stuff through it too).
Reads your data into a hash first.
Then iterates all the possible pairings, calculating your distance.
Conditionally prints if they match the criteria (all your results seem to?)

This gives as results:

1592 -> HETATM 1597 23.5145474334506
1592 -> HETATM 1598 22.5965224094328
1592 -> HETATM 1599 22.7844420822631
1592 -> HETATM 1600 21.8665559702483
1595 -> HETATM 1597 22.6919443415499
1595 -> HETATM 1598 21.7968036647578
1595 -> HETATM 1599 22.1437585337268
1595 -> HETATM 1600 21.2693868505888
1594 -> HETATM 1597 24.3815421169376
1594 -> HETATM 1598 23.509545380547
1594 -> HETATM 1599 23.8816415683679
1594 -> HETATM 1600 23.0248383056212
1593 -> HETATM 1597 23.6802952050856
1593 -> HETATM 1598 22.74957513889
1593 -> HETATM 1599 23.1402816102138
1593 -> HETATM 1600 22.2296935201545

Now you mention wanting to delete lines that are 'too far' - that's a bit complicated, because you've a compound criteria (and you'll delete all your lines).

The problem is - you don't know if your ATOM lines have too much "distance" until you've tested every single pairing in the file.

You could perhaps do this by:

#iterate each element of "ATOM"
foreach my $atom_id ( keys %{$coordinates{'ATOM'}} ) { 
   #iterate all the types (HETATM)
   foreach my $type ( sort keys %types ) { 
      #iterate each id within the data structure. 
      foreach my $id ( sort keys %{$coordinates{$type}} ) { 

         my $dist = 0;
         #take square of x - x1, y - y1, z - z1
         #do it iteratively, using 'for' loop.
         $dist += (($coordinates{$type}{$id}[$_] - $coordinates{'ATOM'}{$atom_id}[$_]) ** 2) for 0..2; 
         $dist = sqrt $dist; 

         print "### $atom_id -> $type $id $dist\n";

         ##note - this will print out multiple times if there's multiple pairings. 
         if ( $dist <= 5 ) {
            print $lines{'ATOM'}{$atom_id};
            print $lines{$type}{$id};
         }
      }
   }
}

Which will - for each pairing-comparison print both the ATOM and HETATM lines that had a distance of <= 5. But it will do so multiple times if multiple pairings exist.

But I think your core error is in mishandling the $part and next clauses.

You only ever increment $part and whilst you initialise it at 0, you never reset it to zero. So it'll be 1,2,3,4 for each successive HETATM.
You use next after incrementing part which means you skip the if ( $part == 1 clause entirely.

Upvotes: 1

Hynek -Pichi- Vychodil

Reputation: 26131

I would use this approach:

#!/usr/bin/env perl 

use strict;
use warnings;

use constant POZITION => ( 6, 7, 8 );    # X, Y, Z

sub dist {
    my ( $a, $b ) = @_;
    my $s = 0;
    for my $i ( 0 .. $#$a ) {
        $s += ( $a->[$i] - $b->[$i] )**2;
    }
    return sqrt($s);
}

# Record format
use constant {
    LINE => 0,
    POZ  => 1,
    KEEP => 2,
};

my ( @refer, @points );
while ( my $line = <> ) {
    my ( $type, @poz ) = ( split ' ', $line )[ 0, POZITION ];
    print STDERR join( ',', $type, @poz ), "\n";
    if ( $type eq 'ATOM' ) {
        push @refer, [ $line, \@poz ];
    }
    elsif ( $type eq 'HETATM' ) {
        push @points, [ $line, \@poz ];
    }
}

for my $ref (@refer) {
    for my $atom (@points) {
        my $dist = dist( $ref->[POZ], $atom->[POZ] );
        print STDERR "$ref->[LINE]$atom->[LINE]dist: $dist\n";
        next if $dist > 5;
        $ref->[KEEP]  ||= 1;
        $atom->[KEEP] ||= 1;
    }
}

print $_->[LINE] for grep $_->[KEEP], @refer, @points;

Unfortunately, your data doesn't contain any ATOM and HETATM pair with distance <= 5. (Note that split ' ' is word split. It means split /\s+/ with omitting any leading and trailing whitespaces.)

It works as a filter with debugging output to the STDERR.

Upvotes: -1

dgw

Reputation: 13666

When seeing a line with HETATM you increment $part and skip to the next input line. Your array @refer will therefor be empty.

Remove the line with next; after incrementing $part.

And your test should be } elsif( $part ) { ... } since you increment $part for each line of HETATM.

Upvotes: 2

Calculating distances between protein and ligand

Answers (3)

Related Questions