Reputation: 674
I have a collection of fatty acid molecules (in SMILES format) in which I would like to find the positions of the C=C double bonds. Position meaning: count how many carbons away the double bond is from the first carbon, which is the carbon of the carboxyl group.
For example for the molecule below the answer would be 5 and 7. (The numbers in the picture denote RDKit atom indices)
C(O)(=O)CCCC=CC=CCC
The solution would be simple enough with a simple regex search directly on the smiles string but this turns out not to be so easy because the absolute C positions in the smiles string may not be 'linear'. For example the following smiles represents the same molecule:
C(C=CCCCC(=O)O)=CCC
Is there a way to solve this problem with RDKit? You can see that simply looking for the C=C double bond substructure and the carboxyl substructure and subtracting the index of those 2, will not work. Somehow you'd have to count the number of carbons in the chain between the 2 substructure matches?
Upvotes: 3
Views: 699
Reputation: 1869
With the RDKit indices you can use Chem.GetShortestPath
.
For your second molecule:
print(len(Chem.GetShortestPath(mol, 6, 2)))
5
print(len(Chem.GetShortestPath(mol, 6, 0)))
7
Upvotes: 4