Reputation: 439
I am trying to write a python function that takes two lists as input: one that contains some molecules SMILES codes and another one that contains the molecule names.
Then it calculates the TANIMOTO coefficient between all pairs of molecules (I already have a function for this) and returns two new lists with the SMILES and names, respectively, of all molecules whose Tanimoto with any other is not higher than a certain threshold.
This is what I have done so far, but it gives wrong results (most of the molecules I get are almost the same...):
def TanimotoFilter(molist,namelist,threshold):
# molist is the smiles list
# namelist is the name list (SURPRISE!) is this some global variable name?
# threshold is the tanimoto threshold (SURPRISE AGAIN!)
smilesout=[]
names=[]
tans=[]
exclude=[]
for i in range(1,len(molist)):
if i not in exclude:
smilesout.append(molist[i])
names.append(namelist[i])
for j in range(i,len(molist)):
if i==j:
tans.append('SAME')
else:
tanimoto=tanimoto_calc(molist[i],molist[j])
if tanimoto>threshold:
exclude.append(j)
#print 'breaking for '+str(i)+' '+str(j)
break
else:
tans.append(tanimoto)
return smilesout, names, tans
I'd be very thankful if the modifications you propose are as basic as possible, as this code is for people who have scarcely seen a terminal in their lives... It doesn't matter if it is full of loops that make it slow.
Thank you all!
Upvotes: 2
Views: 610
Reputation: 452
I have made some changes to the logic of the function. As mentioned in the question, it returns two lists with the SMILES and names. I am not sure about the purpose of tans since the tanimoto value is for a tuple and not for single molecule. Could not test the code without data, let me know if this works.
def TanimotoFilter(molist, namelist, threshold):
# molist is the smiles list
# namelist is the name list (SURPRISE!) is this some global variable name?
# threshold is the tanimoto threshold (SURPRISE AGAIN!)
smilesout=[]
names=[]
tans=[]
exclude=[]
for i in range(0, len(molist)):
if i not in exclude:
temp_exclude = []
for j in range(i + 1, len(molist)):
tanimoto = tanimoto_calc(molist[i], molist[j])
if tanimoto > threshold:
temp_exclude.append(j)
if temp_exclude:
temp_exclude.append(i)
exclude.extend(temp_exclude)
else:
smilesout.append(molist[i])
names.append(namelist[i])
return smilesout, names
Upvotes: 0