Reputation: 955
I have a PDB file. Now it has two parts separated by TER. Before TER I call it part 1. I want to take x,y,z of ATOM 1 of first part i.e before TER and find distance to all x,y,z co ordinates after TER and then second ATOM of part one to all ATOMS of part second. This has to be repeated for all ATOMS of first part= to all ATOMS of second part. I have to automate it for 20 files. names of my files begin like 1_0.pdb,2_0.pdb....20_0.pdb. This is a distance calculation. I have tried something in PERL but its very rough. Can someone help a bit. The File looks like:
----long file (I truncated it)----
ATOM 1279 C ALA 81 -1.925 -11.270 1.404
ATOM 1280 O ALA 81 -0.279 9.355 15.557
ATOM 1281 OXT ALA 81 -2.188 10.341 15.346
TER
ATOM 1282 N THR 82 29.632 5.205 5.525
ATOM 1283 H1 THR 82 30.175 4.389 5.768
ATOM 1284 H2 THR 82 28.816 4.910 5.008
The code is: In the end it finds the maximum distance and its co ordinates
my @points = ();
open(IN, @ARGV[0]) or die "$!";
while (my $line = <IN>) {
chomp($line);
my @array = (split (/\s+/, $line))[5, 6, 7];
print "@array\n";
push @points, [ @array ];
}
close(IN);
$max=0;
for my $i1 ( 0 .. $#points )
{
my ( $x1, $y1, $z1 ) = @{ $points[$i1] };
my $dist = sqrt( ($x1+1.925)**2 + ($y1+11.270)**2 + ($z1-1.404)**2 );
print "distance from (-1.925 -11.270 1.404) to ( $x1, $y1, $z1 ) is $dist\n";
if ( $dist > $max )
{ $max = $dist;
$x=$x1;
$y=$y1;
$z=$z1;
}}
print "maximum value is : $max\n";
print "co ordinates are : $x $y $z\n";
Upvotes: 4
Views: 1836
Reputation: 67910
With proper encapsulation, this is pretty simple, and requires minor modifications of your code.
ETA: Added fixed width solution I had on hand. It would probably be best to read all the fields instead of discarding the first 31 chars, and then return them all in a hash reference. That way, you could process all the lines with the same subroutine, and simply switch between parts when the first field turns out to be TER
. It should be easy for you to extrapolate this from the given code.
You'll note that the reference values are read in with a loop, because we need to break the loop at the break point. The rest of the values are slurped up with a map
statement. Then we simply feed the data to the subroutine we made from your initial code (with some improvements). I used the same names for the lexical variables to make it easier to read the code.
use strict;
use warnings;
my @points;
while (<DATA>) {
last if /^TER$/;
push @points, getpoints($_);
}
my @ref = map getpoints($_), <DATA>;
for my $p (@points) {
getcoords($p, \@ref);
}
sub getpoints {
my $line = shift;
my @data = unpack "A31 A8 A8 A8", $line;
shift @data;
return \@data;
}
sub getcoords {
my ($p, $ref) = @_;
my ($p1,$p2,$p3) = @$p;
my $max=0;
my ($x,$y,$z);
for my $aref ( @$ref ) {
my ( $x1, $y1, $z1 ) = @$aref;
my $dist = sqrt(
($x1-$p1)**2 +
($y1-$p2)**2 +
($z1-$p3)**2
);
print "distance from ($p1 $p2 $p3) to ( $x1, $y1, $z1 ) is $dist\n";
if ( $dist > $max ) {
$max = $dist;
$x=$x1;
$y=$y1;
$z=$z1;
}
}
print "maximum value is : $max\n";
print "co ordinates are : $x $y $z\n";
}
__DATA__
ATOM 1279 C ALA 81 -1.925 -11.270 1.404
ATOM 1280 O ALA 81 -0.279 9.355 15.557
ATOM 1281 OXT ALA 81 -2.188 10.341 15.346
TER
ATOM 1282 N THR 82 29.632 5.205 5.525
ATOM 1283 H1 THR 82 30.175 4.389 5.768
ATOM 1284 H2 THR 82 28.816 4.910 5.008
Upvotes: 0
Reputation: 91518
Not sure I clearly understand what you want, but how about:
#!/usr/local/bin/perl
use strict;
use warnings;
my (@refer, @points);
my $part = 0;
while (my $line = <DATA>) {
chomp($line);
if ($line =~ /^TER/) {
$part++;
next;
}
my @array = (split (/\s+/, $line))[5, 6, 7];
if ($part == 0) {
push @refer, [ @array ];
} else {
push @points, [ @array ];
}
}
my %max = (val=>0, x=>0, y=>0, z=>0);
foreach my $ref(@refer) {
my ($x1, $y1, $z1) = @{$ref};
foreach my $atom(@points) {
my ($x, $y, $z) = @{$atom};
my $dist = sqrt( ($x-$x1)**2 + ($y-$y1)**2 + ($z-$z1)**2 );
if ($dist > $max{val}) {
$max{val} = $dist;
$max{x} = $x;
$max{y} = $y;
$max{z} = $z;
}
}
}
print "max is $max{val}; coord: x=$max{x}, y=$max{y}, z=$max{z}\n";
__DATA__
ATOM 1279 C ALA 81 -1.925 -11.270 1.404
ATOM 1280 O ALA 81 -0.279 9.355 15.557
ATOM 1281 OXT ALA 81 -2.188 10.341 15.346
TER
ATOM 1282 N THR 82 29.632 5.205 5.525
ATOM 1283 H1 THR 82 30.175 4.389 5.768
ATOM 1284 H2 THR 82 28.816 4.910 5.008
output:
max is 35.9813670807545; coord: x=30.175, y=4.389, z=5.768
Upvotes: 1
Reputation: 2151
The main issue here is reading the data. First, note that one cannot use split with PDB text files since the fields are defined by position and not by separators. See Coordinate File Description (PDB Format).
To separate the ATOM record of different polymer chains you can start with a simplified version like
my $iblock = 0;
my @atoms = ();
while (my $line = <IN>) {
chomp($line);
# Switch blocks at TER lines
if ($line =~ /^TER/) {
$iblock++;
# Read ATOM lines
} elsif ($line =~ m/^ATOM/) {
my @xyz = (substr($line,7-1,9),substr($line,16-1,9),substr($line,25-1,9));
printf "Block %d: atom at (%s)\n",$iblock,join (",",@xyz);
push @{$atoms[$iblock]},\@xyz;
# Parse additional line types (if needed)
} else {
...
}
}
Followed by a loop over all pairs of coordinates from different blocks, structured as follows:
# 1st block
for my $iblock1 (0..$#atoms) {
# 2nd block
for my $iblock2 ($iblock1+1..$#atoms) {
# Compare all pairs of atoms
...
my $xyz1 (@{$atoms[$iblock1]}) {
for my $xyz2 (@{$atoms[$iblock2]}) {
# Calculate distance and compare with $max_dist
...
}
}
# Print the maximal distance between these two blocks
...
}
}
Of course, the code could be more general if a more elaborate data structure is used or by applying one of the available PDB parsers, such as Bioperl's.
Upvotes: 1