Reputation: 95
I have files in two type: A: contains 1206 lines of coordinates (xyz) - a protein chain B: contains 114 lines of coordinates (xyz) - a bunch of molecule
I would like to do the followings: For each line of A calculate distance from each line of B. So I get 114 distance value for each line of A. But I don't need all of them, just the shortest for each line of A. So the desired output: A file with 1206 lines, each line contains one value: the shortest distance. Important to keep the original order of file A.
My code:
import os
import sys
import numpy as np
outdir = r'E:\MTA\aminosavak_tavolsag\tavolsagok'
for dirname, dirnames, filenames in os.walk(r'E:\MTA\aminosavak_tavolsag\receptorok'):
for path, dirs, files in os.walk(r'E:\MTA\aminosavak_tavolsag\kotohely'):
for filename in filenames:
for fileok in files:
if filename == fileok:
with open(os.path.join(outdir, filename) , "a+") as f:
data_ligand = np.loadtxt(os.path.join(path, fileok))
data_rec = np.loadtxt(os.path.join(dirname, filename))
for i in data_rec:
for j in data_ligand:
dist = np.linalg.norm(i - j)
dist_float = dist.tolist()
dist_str = str(dist_float)
dist_list = dist_str.split()
for szamok in dist_list:
for x in range(len(dist_list)):
minimum = min([float(x) for x in dist_list])
f.write(str(minimum) + "\r\n")
This code works but only partially. --- My ultimate goal to find the protein residues are close enough to this bunch of molecule (binding site). I can check my results with a visual software and my code find much more less residues than it should. ----
I just can't figure out where is the problem. Could you help me? Thanks!
Upvotes: 0
Views: 883
Reputation: 854
Your code is pretty confusing and I can see a few mistakes.
You're using minimum
outside of the for
loop, so only its last value is written.
Also, the way you computes minimum
is wierd. szamok
is not used, nor is x
(since you use another x
inside the list expression), so both for
loops surrounding minimum = ...
are useless.
Another suspicious thing is str(dist_float)
. You're converting a list of float to string. This will give you the string representation of the list, not a list of string. Not only is this useless, it's also wrong because when you split it after it won't give you the expected result.
Assuming i
and j
stands for the data lines of A and B, I would rewrite the end of your code like this:
...
data_ligand = np.loadtxt(os.path.join(path, fileok))
data_rec = np.loadtxt(os.path.join(dirname, filename))
for i in data_rec:
min_dist = min(np.linalg.norm(i - j) for j in data_ligand)
f.write("{}\r\n".format(min_dist)) # easier than `str(min_dist)` to customize format
Upvotes: 1