Reputation: 49
I have text files as shown below.
CA 21.660 -6.795 11.323
C 28.811 -9.801 16.262
O 23.221 -9.266 13.799
CB 33.528 -11.934 17.900
N 21.660 -6.795 11.323
O 32.410 -8.539 16.566
I want to calculate the distances between the coordinates of atoms. For example, I would like to read all files in a folder and calculate the distances between first and second atom,first and third, first and fourth, etc.Then, second and third, second and fourth, second and fifth etc.The formula is SQRT ((X1-X2)^2 + (Y1-Y2)^2 + (Z1-Z2)^2). I would like to save the outputs from each file to another folder with the names of input files. How can I do this with awk?
desired output
CA-C 4.52
CA-O 3.80
CA-CB 5.68
CA-N 8.94
--
--
--
N-O 5.98
your help would be appreciated!!
Upvotes: 2
Views: 1462
Reputation: 6613
A Perl solution could be:
#!/usr/bin/perl
use strict;
use warnings;
my @data = map [split], <DATA>;
for (my $i = 0; $i < @data; $i++) {
for (my $j = $i+1; $j < @data; $j++) {
my $d = distance( @data[$i, $j]);
printf("%-6s%7.4f\n",
join("-", map $_->[0], @data[$i, $j]), $d) if $d <= 6;
}
}
sub distance {
my ($coord1, $coord2) = @_;
my $sum;
for my $k (1 .. $#$coord1) {
$sum += ($coord1->[$k] - $coord2->[$k])**2;
}
return sqrt $sum;
}
__DATA__
CA 21.660 -6.795 11.323
C 28.811 -9.801 16.262
O 23.221 -9.266 13.799
CB 33.528 -11.934 17.900
N 21.660 -6.795 11.323
O 32.410 -8.539 16.566
Upvotes: 0
Reputation: 67291
Below is the code in awk:
awk '{a[NR]=$0}
END
{
for(i=1;i<=NR;i++)
{split(a[i],k);
for(j=i+1;j<=NR;j++)
{
split(a[j],l);
dist=(k[2]-l[2])*(k[2]-l[2])+(k[3]-l[3])*(k[3]-l[3])+(k[4]-l[4])*(k[4]-l[4]);
print k[1]"-"l[1],sqrt(dist);
}
}
}' your_file
And Below is the Test:
> cat temp
CA 21.660 -6.795 11.323
C 28.811 -9.801 16.262
O 23.221 -9.266 13.799
CB 33.528 -11.934 17.900
N 21.660 -6.795 11.323
O 32.410 -8.539 16.566
Execution:
> awk '{a[NR]=$0}END{for(i=1;i<=NR;i++){split(a[i],k);for(j=i+1;j<=NR;j++){split(a[j],l);dist=(k[2]-l[2])*(k[2]-l[2])+(k[3]-l[3])*(k[3]-l[3])+(k[4]-l[4])*(k[4]-l[4]);print k[1]"-"l[1],sqrt(dist);}}}' temp
CA-C 9.19601
CA-O 3.83055
CA-CB 14.5092
CA-N 0
CA-O 12.0869
C-O 6.13194
C-CB 5.42981
C-N 9.19601
C-O 3.82595
O-CB 11.4092
O-N 3.83055
O-O 9.62406
CB-N 14.5092
CB-O 3.81517
N-O 12.0869
>
Upvotes: 1
Reputation: 54512
Here's one way using GNU awk
:
awk 'FNR==NR { a[NR]=$0; next } { for (i=FNR+1;i<=NR-1;i++) { split(a[i],b); print $1 "-" b[1], sqrt(($2-b[2])^2 + ($3-b[3])^2 + ($4-b[4])^2) | "column -t" } NR--}' file file
It does what you want, but either the algorithm you've provided is different from what you require, or, your expected output has been incorrectly calculated (I'm assuming the latter is the problem). Anyway, here's the results:
CA-C 9.19601
CA-O 3.83055
CA-CB 14.5092
CA-N 0
CA-O 12.0869
C-O 6.13194
C-CB 5.42981
C-N 9.19601
C-O 3.82595
O-CB 11.4092
O-N 3.83055
O-O 9.62406
CB-N 14.5092
CB-O 3.81517
N-O 12.0869
If you need to perform this on multiple files in the present working directory, and assuming there's nothing but files of interest in this directory, you can wrap a for
loop around the awk
statement. Obviously, you'll need to change /path/to/folder/
to your path of choice for it to work correctly:
for i in *; do awk 'FNR==NR { a[NR]=$0; next } { for (i=FNR+1;i<=NR-1;i++) { split(a[i],b); print $1 "-" b[1], sqrt(($2-b[2])^2 + ($3-b[3])^2 + ($4-b[4])^2) | "column -t > /path/to/folder/" FILENAME } NR--}' "$i"{,}; done
Upvotes: 4
Reputation: 125
If atoms is a file containing your data
awk '{ p[NR,0]=$1;p[NR,1]=$2;p[NR,2]=$3;p[NR,3]=$4; for (j=1;j<=NR-1;j++) print p[j,0]"-"$1,sqrt((p[NR,1]-p[j,1])^2+(p[NR,2]-p[j,2])^2+(p[NR,3]-p[j,3])^2) }' atoms
CA-C 9.19601
CA-O 3.83055
C-O 6.13194
CA-CB 14.5092
C-CB 5.42981
O-CB 11.4092
CA-N 0
C-N 9.19601
O-N 3.83055
CB-N 14.5092
CA-O 12.0869
C-O 3.82595
O-O 9.62406
CB-O 3.81517
N-O 12.0869
There is a problem:
Upvotes: 3
Reputation: 204015
Something like this sounds like what you want, but clearly neither of the results match what you say they should so clarify your algorithm:
$ awk 'NR>1{print p[1]"-"$1,sqrt((p[2]-$2)^2 + (p[3]-$3)^2 + (p[4]-$4)^2)} {split($0,p) }' file
CA-C 9.19601
C-O 6.13194
O-CB 11.4092
CB-N 14.5092
N-O 12.0869
$ awk 'NR>1{print p[1]"-"$1,sqrt(($2-p[2])^2 + ($3-p[3])^2 + ($4-p[4])^2)} {split($0,p) }' file
CA-C 9.19601
C-O 6.13194
O-CB 11.4092
CB-N 14.5092
N-O 12.0869
Upvotes: 1