meadeytabeedy
meadeytabeedy

Reputation: 31

Accessing output of RDKIT Chem.FindAllSubgraphsOfLengthN(mol,n)

I am attempting to use RDKIT Chem.FindAllSubgraphsOfLengthN(mol,n) function but am unable to callout the information from using this function. It runs, but I am unable to obtain the substructures.

Does anyone have suggestions on successfully calling out the information from the output of this function after it runs?

I am expecting an output, either in tuple or string form, that lists all substructures with Length N atoms. In my explicit case, I am looking for 4 atoms. Attached is code I have run with also listing the output errors.

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import AllChem
AllChem.SetPreferCoordGen(True)
from rdkit.Chem import rdmolops

mol = Chem.MolFromSmiles('C=C(S)C(N)(O)C')
mol
structures = Chem.FindAllSubgraphsOfLengthN(mol,4)
structures
print(structures[1])

Output: <rdkit.rdBase._vectint object at 0x0000024C16BBA2E0>

structures[0]

Output: <rdkit.rdBase._vectint at 0x24c16ba62e0>

Upvotes: 1

Views: 434

Answers (2)

meadeytabeedy
meadeytabeedy

Reputation: 31

A team member of mine was able to generate a code as to access the paths or subgraphs information. To clarify, paths don't have branching where subgraphs do.

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import AllChem
AllChem.SetPreferCoordGen(True)
from rdkit.Chem import rdmolops
import csv

mol = Chem.MolFromSmiles('C=C(S)C(N)(O)C')
mol


# Find all subgraphs of length 3 in the molecule
#Subgraphs have branching
subgraphs = Chem.FindAllSubgraphsOfLengthN(mol, 3)

#Paths is no branching
#subgraphs = Chem.FindAllPathsOfLengthN(mol, 3)
print(len(subgraphs))

# Print out the connected SMILES for each subgraph
for subgraph in subgraphs:
    # Get the subgraph as a new molecule object
    sub_mol = Chem.PathToSubmol(mol, subgraph)
    # Generate the connected SMILES string for the subgraph
    subgraph_smiles = Chem.MolToSmiles(sub_mol, kekuleSmiles=True)
    print(subgraph_smiles)

Output
11
C=CCC
C=CCO
C=CCN
C=C(C)S
CCCS
OCCS
NCCS
CC(C)O
CC(C)N
CC(N)O
CC(N)O

Upvotes: 0

rapelpy
rapelpy

Reputation: 1869

You have to convert to list().

And since FindAllSubgraphsOfLengthN returns bonds and not atoms, you have to look for three bonds.

from rdkit import Chem
from rdkit.Chem import rdDepictor
rdDepictor.SetPreferCoordGen(True)
from rdkit.Chem.Draw import IPythonConsole
IPythonConsole.drawOptions.addBondIndices = True

mol = Chem.MolFromSmiles('C=C(S)C(N)(O)C')
mol

enter image description here

threebonds = Chem.FindAllSubgraphsOfLengthN(mol, 3)

for n in threebonds:
    print(list(n))

Output:

[0, 2, 5]
[0, 2, 4]
[0, 2, 3]
[0, 2, 1]
[1, 2, 5]
[1, 2, 4]
[1, 2, 3]
[2, 5, 4]
[2, 5, 3]
[2, 4, 3]
[3, 5, 4]

Upvotes: 1

Related Questions