jonas87
jonas87

Reputation: 674

Finding the relative position of molecular substructures with RDKit

I have a collection of fatty acid molecules (in SMILES format) in which I would like to find the positions of the C=C double bonds. Position meaning: count how many carbons away the double bond is from the first carbon, which is the carbon of the carboxyl group.

For example for the molecule below the answer would be 5 and 7. (The numbers in the picture denote RDKit atom indices)

C(O)(=O)CCCC=CC=CCC

enter image description here

The solution would be simple enough with a simple regex search directly on the smiles string but this turns out not to be so easy because the absolute C positions in the smiles string may not be 'linear'. For example the following smiles represents the same molecule:

C(C=CCCCC(=O)O)=CCC

enter image description here

Is there a way to solve this problem with RDKit? You can see that simply looking for the C=C double bond substructure and the carboxyl substructure and subtracting the index of those 2, will not work. Somehow you'd have to count the number of carbons in the chain between the 2 substructure matches?

Upvotes: 3

Views: 699

Answers (1)

rapelpy
rapelpy

Reputation: 1869

With the RDKit indices you can use Chem.GetShortestPath.

For your second molecule:

print(len(Chem.GetShortestPath(mol, 6, 2)))
5
print(len(Chem.GetShortestPath(mol, 6, 0)))
7

Upvotes: 4

Related Questions