kanika
kanika

Reputation: 955

Distance between one point to all other in a PDB file

I have a PDB file. Now it has two parts separated by TER. Before TER I call it part 1. I want to take x,y,z of ATOM 1 of first part i.e before TER and find distance to all x,y,z co ordinates after TER and then second ATOM of part one to all ATOMS of part second. This has to be repeated for all ATOMS of first part= to all ATOMS of second part. I have to automate it for 20 files. names of my files begin like 1_0.pdb,2_0.pdb....20_0.pdb. This is a distance calculation. I have tried something in PERL but its very rough. Can someone help a bit. The File looks like:

----long file (I truncated it)----

ATOM   1279 C    ALA    81      -1.925 -11.270   1.404
ATOM   1280 O    ALA    81      -0.279   9.355  15.557
ATOM   1281 OXT  ALA    81      -2.188  10.341  15.346
TER   
ATOM   1282 N    THR    82      29.632   5.205   5.525
ATOM   1283 H1   THR    82      30.175   4.389   5.768
ATOM   1284 H2   THR    82      28.816   4.910   5.008

The code is: In the end it finds the maximum distance and its co ordinates

my @points = (); 
open(IN, @ARGV[0]) or die "$!"; 
while (my $line = <IN>) { 

  chomp($line); 
  my @array = (split (/\s+/, $line))[5, 6, 7]; 
  print "@array\n"; 
  push @points, [ @array ]; 
} 
close(IN); 


 $max=0;
 for my $i1 ( 0 .. $#points  )

{ 
    my ( $x1, $y1, $z1 ) = @{ $points[$i1] };  
    my $dist = sqrt( ($x1+1.925)**2 + ($y1+11.270)**2 + ($z1-1.404)**2 ); 
    print "distance from (-1.925 -11.270 1.404) to ( $x1, $y1, $z1 ) is $dist\n"; 

    if ( $dist > $max )
     { $max = $dist;
       $x=$x1;
       $y=$y1;
       $z=$z1; 
      }}
    print "maximum value is : $max\n";
print "co ordinates are : $x $y $z\n";        

Upvotes: 4

Views: 1836

Answers (3)

TLP
TLP

Reputation: 67910

With proper encapsulation, this is pretty simple, and requires minor modifications of your code.

ETA: Added fixed width solution I had on hand. It would probably be best to read all the fields instead of discarding the first 31 chars, and then return them all in a hash reference. That way, you could process all the lines with the same subroutine, and simply switch between parts when the first field turns out to be TER. It should be easy for you to extrapolate this from the given code.

You'll note that the reference values are read in with a loop, because we need to break the loop at the break point. The rest of the values are slurped up with a map statement. Then we simply feed the data to the subroutine we made from your initial code (with some improvements). I used the same names for the lexical variables to make it easier to read the code.

use strict;
use warnings;

my @points;
while (<DATA>) {
    last if /^TER$/;
    push @points, getpoints($_);
}
my @ref = map getpoints($_), <DATA>;

for my $p (@points) {
    getcoords($p, \@ref);
}

sub getpoints {
    my $line = shift;
    my @data = unpack "A31 A8 A8 A8", $line;
    shift @data;
    return \@data;
}
sub getcoords {
    my ($p, $ref) = @_;
    my ($p1,$p2,$p3) = @$p;
    my $max=0;
    my ($x,$y,$z);
    for my $aref ( @$ref ) {
        my ( $x1, $y1, $z1 ) = @$aref;  
        my $dist = sqrt(
            ($x1-$p1)**2 +
            ($y1-$p2)**2 +
            ($z1-$p3)**2
        ); 
        print "distance from ($p1 $p2 $p3) to ( $x1, $y1, $z1 ) is $dist\n"; 

        if ( $dist > $max ) {
            $max = $dist;
            $x=$x1;
            $y=$y1;
            $z=$z1; 
        }
    }
    print "maximum value is : $max\n";
    print "co ordinates are : $x $y $z\n";
}

__DATA__
ATOM   1279 C    ALA    81      -1.925 -11.270   1.404
ATOM   1280 O    ALA    81      -0.279   9.355  15.557
ATOM   1281 OXT  ALA    81      -2.188  10.341  15.346
TER
ATOM   1282 N    THR    82      29.632   5.205   5.525
ATOM   1283 H1   THR    82      30.175   4.389   5.768
ATOM   1284 H2   THR    82      28.816   4.910   5.008

Upvotes: 0

Toto
Toto

Reputation: 91518

Not sure I clearly understand what you want, but how about:

#!/usr/local/bin/perl 
use strict;
use warnings;

my (@refer, @points);
my $part = 0;
while (my $line = <DATA>) { 
    chomp($line);
    if ($line =~ /^TER/) {
        $part++;
        next;
    }
    my @array = (split (/\s+/, $line))[5, 6, 7]; 
    if ($part == 0) {
        push @refer, [ @array ]; 
    } else {
        push @points, [ @array ]; 
    }
} 
my %max = (val=>0, x=>0, y=>0, z=>0);
foreach my $ref(@refer) {
    my ($x1, $y1, $z1) = @{$ref};
    foreach my $atom(@points) {
        my ($x, $y, $z) = @{$atom};
        my $dist = sqrt( ($x-$x1)**2 + ($y-$y1)**2 + ($z-$z1)**2 );
        if ($dist > $max{val}) {
            $max{val} = $dist;
            $max{x} = $x;
            $max{y} = $y;
            $max{z} = $z;
        }
    }
}
print "max is $max{val}; coord: x=$max{x}, y=$max{y}, z=$max{z}\n";

__DATA__
ATOM   1279 C    ALA    81      -1.925 -11.270   1.404
ATOM   1280 O    ALA    81      -0.279   9.355  15.557
ATOM   1281 OXT  ALA    81      -2.188  10.341  15.346
TER   
ATOM   1282 N    THR    82      29.632   5.205   5.525
ATOM   1283 H1   THR    82      30.175   4.389   5.768
ATOM   1284 H2   THR    82      28.816   4.910   5.008

output:

max is 35.9813670807545; coord: x=30.175, y=4.389, z=5.768

Upvotes: 1

Itamar
Itamar

Reputation: 2151

The main issue here is reading the data. First, note that one cannot use split with PDB text files since the fields are defined by position and not by separators. See Coordinate File Description (PDB Format).

To separate the ATOM record of different polymer chains you can start with a simplified version like

my $iblock = 0;
my @atoms = ();
while (my $line = <IN>) { 
   chomp($line);

   # Switch blocks at TER lines
   if ($line =~ /^TER/) {
      $iblock++;

   # Read ATOM lines
   } elsif ($line =~ m/^ATOM/) {
      my @xyz = (substr($line,7-1,9),substr($line,16-1,9),substr($line,25-1,9)); 
      printf "Block %d: atom at (%s)\n",$iblock,join (",",@xyz); 
      push @{$atoms[$iblock]},\@xyz; 

   # Parse additional line types (if needed)
   } else {
      ...
   }
} 

Followed by a loop over all pairs of coordinates from different blocks, structured as follows:

# 1st block
for my $iblock1 (0..$#atoms) {

   # 2nd block
   for my $iblock2 ($iblock1+1..$#atoms) {

      # Compare all pairs of atoms
      ... 
      my $xyz1 (@{$atoms[$iblock1]}) {
         for my $xyz2 (@{$atoms[$iblock2]}) {
            # Calculate distance and compare with $max_dist
            ...
         }
      }
      # Print the maximal distance between these two blocks
      ...
   }
}

Of course, the code could be more general if a more elaborate data structure is used or by applying one of the available PDB parsers, such as Bioperl's.

Upvotes: 1

Related Questions