Python calculate minimum distances between multiple coordinates

Question

I have files in two type: A: contains 1206 lines of coordinates (xyz) - a protein chain B: contains 114 lines of coordinates (xyz) - a bunch of molecule

I would like to do the followings: For each line of A calculate distance from each line of B. So I get 114 distance value for each line of A. But I don't need all of them, just the shortest for each line of A. So the desired output: A file with 1206 lines, each line contains one value: the shortest distance. Important to keep the original order of file A.

My code:

import os
import sys
import numpy as np



outdir = r'E:\MTA\aminosavak_tavolsag	avolsagok'
for dirname, dirnames, filenames in os.walk(r'E:\MTA\aminosavak_tavolsag
eceptorok'):
    for path, dirs, files in os.walk(r'E:\MTA\aminosavak_tavolsag\kotohely'):
        for filename in filenames:
            for fileok in files:
                if filename == fileok:
                    with open(os.path.join(outdir, filename) , "a+") as f:
                        data_ligand = np.loadtxt(os.path.join(path, fileok))
                        data_rec = np.loadtxt(os.path.join(dirname, filename))

                        for i in data_rec:
                            for j in data_ligand:

                                dist = np.linalg.norm(i - j)

                                dist_float = dist.tolist()  
                                dist_str = str(dist_float)
                                dist_list = dist_str.split()
                                for szamok in dist_list:
                                    for x in range(len(dist_list)):
                                        minimum = min([float(x) for x in dist_list])

                            f.write(str(minimum) + "
")

This code works but only partially. --- My ultimate goal to find the protein residues are close enough to this bunch of molecule (binding site). I can check my results with a visual software and my code find much more less residues than it should. ----

I just can't figure out where is the problem. Could you help me? Thanks!

kjaquier · Accepted Answer

Your code is pretty confusing and I can see a few mistakes.

You're using minimum outside of the for loop, so only its last value is written.

Also, the way you computes minimum is wierd. szamok is not used, nor is x (since you use another x inside the list expression), so both for loops surrounding minimum = ... are useless.

Another suspicious thing is str(dist_float). You're converting a list of float to string. This will give you the string representation of the list, not a list of string. Not only is this useless, it's also wrong because when you split it after it won't give you the expected result.

Assuming i and j stands for the data lines of A and B, I would rewrite the end of your code like this:

...
data_ligand = np.loadtxt(os.path.join(path, fileok))
data_rec = np.loadtxt(os.path.join(dirname, filename))

for i in data_rec:
    min_dist = min(np.linalg.norm(i - j) for j in data_ligand)
    f.write("{}
".format(min_dist))  # easier than `str(min_dist)` to customize format

Python calculate minimum distances between multiple coordinates

Answers (1)

Related Questions