milan
milan

Reputation: 51

Calculating distances between protein and ligand

I am trying to calculate distance between each coordinate of protein atom (ATOM) and ligand atom (HETATM). I have number of files that look like this:

ATOM   1592 HD13 LEU D  46      11.698 -10.914   2.183  1.00  0.00           H  
ATOM   1593 HD21 LEU D  46      11.528  -8.800   5.301  1.00  0.00           H  
ATOM   1594 HD22 LEU D  46      12.997  -9.452   4.535  1.00  0.00           H  
ATOM   1595 HD23 LEU D  46      11.722  -8.718   3.534  1.00  0.00           H  
HETATM 1597  N1  308 A   1       0.339   6.314  -9.091  1.00  0.00           N  
HETATM 1598  C10 308 A   1      -0.195   5.226  -8.241  1.00  0.00           C  
HETATM 1599  C7  308 A   1      -0.991   4.254  -9.133  1.00  0.00           C  
HETATM 1600  C1  308 A   1      -1.468   3.053  -8.292  1.00  0.00           C 

So I am trying to calculate distances between ATOM1 and all other HETATM1, between ATOM1 and all other 'HETATM2' and so on. I have written a script in perl, but I cannot figure it out what is wrong with the script, it doesnt give me any error it just does not print anything.

I am also not sure how to add it in the script and if it is possible, that if the result of each calculation is more then 5 then delete this both lines that were included into calculation. If it is <= then 5 then keep it.

#!/usr/local/bin/perl 

    use strict;
    use warnings;

    open(IN, $ARGV[0]) or die "$!"; 
    my (@refer, @points);
    my $part = 0;
    my $dist;
    while (my $line = <IN>) { 
        chomp($line);
        if ($line =~ /^HETATM/) {
            $part++;
            next;
        }
        my @array = (substr($line, 30, 8),substr($line,38,8),substr($line,46,8));
    #    print "@array\n";
        if ($part == 0) {
            push @refer, [ @array ]; 
        } elsif ($part ==1){
            push @points, [ @array ]; 
        }
    }

        foreach my $ref(@refer) {
        my ($x1, $y1, $z1) = @{$ref};
        foreach my $atom(@points) {
            my ($x, $y, $z) = @{$atom};
            my $dist = sqrt( ($x-$x1)**2 + ($y-$y1)**2 + ($z-$z1)**2 );
        print $dist;

        }

    }

Upvotes: 1

Views: 397

Answers (3)

Sobrique
Sobrique

Reputation: 53498

OK, I have to say - I'd rewrite your code, to work a bit differently.

Something like this:

#!/usr/bin/env perl
use strict;
use warnings;

use Data::Dumper;

my %coordinates; 
#use types to track different types. Unclear if you need to handle anything other than 'not ATOM' but this is in case you do. 

my %types; 

#read STDIN or files specified on command line - like how grep/sed do it. 
while ( <> ) {
   my ( $type, $id, undef, undef, undef, undef, $x, $y, $z ) = split; # splits on white space. 
   $coordinates{$type}{$id} = [$x, $y, $z];
   $types{$type}++ if $type ne 'ATOM'; 
}

#print for debugging:
print Dumper \%coordinates;
print Dumper \%types;

#iterate each element of "ATOM"
foreach my $atom_id ( keys %{$coordinates{'ATOM'}} ) { 
   #iterate all the types (HETATM)
   foreach my $type ( sort keys %types ) { 
      #iterate each id within the data structure. 
      foreach my $id ( sort keys %{$coordinates{$type}} ) { 

         my $dist = 0;
         #take square of x - x1, y - y1, z - z1
         #do it iteratively, using 'for' loop.
         $dist += (($coordinates{$type}{$id}[$_] - $coordinates{'ATOM'}{$atom_id}[$_]) ** 2) for 0..2; 
         $dist = sqrt $dist; 

         print "$atom_id -> $type $id $dist\n";
      }

This is:

  • Using <> to read STDIN or named files on command line instead of manually opening ARGV[0] which accomplishes a similar result. (but means you can pipe stuff through it too).
  • Reads your data into a hash first.
  • Then iterates all the possible pairings, calculating your distance.
  • Conditionally prints if they match the criteria (all your results seem to?)

This gives as results:

1592 -> HETATM 1597 23.5145474334506
1592 -> HETATM 1598 22.5965224094328
1592 -> HETATM 1599 22.7844420822631
1592 -> HETATM 1600 21.8665559702483
1595 -> HETATM 1597 22.6919443415499
1595 -> HETATM 1598 21.7968036647578
1595 -> HETATM 1599 22.1437585337268
1595 -> HETATM 1600 21.2693868505888
1594 -> HETATM 1597 24.3815421169376
1594 -> HETATM 1598 23.509545380547
1594 -> HETATM 1599 23.8816415683679
1594 -> HETATM 1600 23.0248383056212
1593 -> HETATM 1597 23.6802952050856
1593 -> HETATM 1598 22.74957513889
1593 -> HETATM 1599 23.1402816102138
1593 -> HETATM 1600 22.2296935201545

Now you mention wanting to delete lines that are 'too far' - that's a bit complicated, because you've a compound criteria (and you'll delete all your lines).

The problem is - you don't know if your ATOM lines have too much "distance" until you've tested every single pairing in the file.

You could perhaps do this by:

#iterate each element of "ATOM"
foreach my $atom_id ( keys %{$coordinates{'ATOM'}} ) { 
   #iterate all the types (HETATM)
   foreach my $type ( sort keys %types ) { 
      #iterate each id within the data structure. 
      foreach my $id ( sort keys %{$coordinates{$type}} ) { 

         my $dist = 0;
         #take square of x - x1, y - y1, z - z1
         #do it iteratively, using 'for' loop.
         $dist += (($coordinates{$type}{$id}[$_] - $coordinates{'ATOM'}{$atom_id}[$_]) ** 2) for 0..2; 
         $dist = sqrt $dist; 

         print "### $atom_id -> $type $id $dist\n";

         ##note - this will print out multiple times if there's multiple pairings. 
         if ( $dist <= 5 ) {
            print $lines{'ATOM'}{$atom_id};
            print $lines{$type}{$id};
         }
      }
   }
}

Which will - for each pairing-comparison print both the ATOM and HETATM lines that had a distance of <= 5. But it will do so multiple times if multiple pairings exist.

But I think your core error is in mishandling the $part and next clauses.

  • You only ever increment $part and whilst you initialise it at 0, you never reset it to zero. So it'll be 1,2,3,4 for each successive HETATM.
  • You use next after incrementing part which means you skip the if ( $part == 1 clause entirely.

Upvotes: 1

Hynek -Pichi- Vychodil
Hynek -Pichi- Vychodil

Reputation: 26131

I would use this approach:

#!/usr/bin/env perl 

use strict;
use warnings;

use constant POZITION => ( 6, 7, 8 );    # X, Y, Z

sub dist {
    my ( $a, $b ) = @_;
    my $s = 0;
    for my $i ( 0 .. $#$a ) {
        $s += ( $a->[$i] - $b->[$i] )**2;
    }
    return sqrt($s);
}

# Record format
use constant {
    LINE => 0,
    POZ  => 1,
    KEEP => 2,
};

my ( @refer, @points );
while ( my $line = <> ) {
    my ( $type, @poz ) = ( split ' ', $line )[ 0, POZITION ];
    print STDERR join( ',', $type, @poz ), "\n";
    if ( $type eq 'ATOM' ) {
        push @refer, [ $line, \@poz ];
    }
    elsif ( $type eq 'HETATM' ) {
        push @points, [ $line, \@poz ];
    }
}

for my $ref (@refer) {
    for my $atom (@points) {
        my $dist = dist( $ref->[POZ], $atom->[POZ] );
        print STDERR "$ref->[LINE]$atom->[LINE]dist: $dist\n";
        next if $dist > 5;
        $ref->[KEEP]  ||= 1;
        $atom->[KEEP] ||= 1;
    }
}

print $_->[LINE] for grep $_->[KEEP], @refer, @points;

Unfortunately, your data doesn't contain any ATOM and HETATM pair with distance <= 5. (Note that split ' ' is word split. It means split /\s+/ with omitting any leading and trailing whitespaces.)

It works as a filter with debugging output to the STDERR.

Upvotes: -1

dgw
dgw

Reputation: 13666

When seeing a line with HETATM you increment $part and skip to the next input line. Your array @refer will therefor be empty.

Remove the line with next; after incrementing $part.

And your test should be } elsif( $part ) { ... } since you increment $part for each line of HETATM.

Upvotes: 2

Related Questions