Reputation: 31
I am interested in obtaining fragments, or sub structures, that contain 4 non-hydrogen atoms within a larger molecule.
The closest example to accomplishing this is referenced from https://iwatobipen.wordpress.com/2020/08/12/get-and-draw-molecular-fragment-with-user-defined-path-rdkit-memo/
I used their work but I don't believe "ALL" 4 atom substructures are found within their methodology. They use the concept of "radius" around centered atoms. This doesn't distinguish all possible substructures I am looking for.
Does anyone have suggestions for improving this code to obtaining my goal or a differing direction to go for future work?
I followed the method presented by the link above. Various substructures are presented but not 'ALL' 4 atom fragments are specified when running this method.
Substructure example I don't see using this methodology but I believe I should see at radii 1 or radii 2.1]1
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import AllChem
AllChem.SetPreferCoordGen(True)
def getSubmolRadN(mol, radius):
atoms=mol.GetAtoms()
submols=[]
for atom in atoms:
env=Chem.FindAtomEnvironmentOfRadiusN(mol, radius, atom.GetIdx())
amap={}
submol=Chem.PathToSubmol(mol, env, atomMap=amap)
subsmi=Chem.MolToSmiles(submol, rootedAtAtom=amap[atom.GetIdx()], canonical=False)
submols.append(Chem.MolFromSmiles(subsmi, sanitize=False))
return submols
mol = Chem.MolFromSmiles('C=C(S)C(N)(O)C')
submols = getSubmolRadN(mol,1)
Draw.MolsToGridImage(submols, highlightAtomLists=[[0] for _ in range(len(submols))], molsPerRow=5)
submols = getSubmolRadN(mol,2)
Draw.MolsToGridImage(submols, highlightAtomLists=[[0] for _ in range(len(submols))], molsPerRow=5)
submols = getSubmolRadN(mol,3)
Draw.MolsToGridImage(submols, highlightAtomLists=[[0] for _ in range(len(submols))], molsPerRow=5)
Outputs for Radii 1 and 2, radii 3 I obtain an error
Upvotes: 1
Views: 1982
Reputation: 31
Previous user (rapelpy) answered the question as well. A team member of mine was able to generate a code as to obtain the paths or subgraphs. To clarify, paths don't have branching where subgraphs do.
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import AllChem
AllChem.SetPreferCoordGen(True)
from rdkit.Chem import rdmolops
import csv
mol = Chem.MolFromSmiles('C=C(S)C(N)(O)C')
mol
# Find all subgraphs of length 3 in the molecule
#Subgraphs have branching
subgraphs = Chem.FindAllSubgraphsOfLengthN(mol, 3)
#Paths is no branching
#subgraphs = Chem.FindAllPathsOfLengthN(mol, 3)
print(len(subgraphs))
# Print out the connected SMILES for each subgraph
for subgraph in subgraphs:
# Get the subgraph as a new molecule object
sub_mol = Chem.PathToSubmol(mol, subgraph)
# Generate the connected SMILES string for the subgraph
subgraph_smiles = Chem.MolToSmiles(sub_mol, kekuleSmiles=True)
print(subgraph_smiles)
Output
11
C=CCC
C=CCO
C=CCN
C=C(C)S
CCCS
OCCS
NCCS
CC(C)O
CC(C)N
CC(N)O
CC(N)O
Upvotes: 0
Reputation: 1869
To find 4 connected non-hydrogen atoms you can use SMARTS
.
Since a connection of 4 can be linear or have one branch and must not end with hydrogen (important if you have explicit hydrogen), these two SMARTS should work.
from rdkit import Chem
from rdkit.Chem import Draw, rdDepictor
rdDepictor.SetPreferCoordGen(True)
from rdkit.Chem.Draw import IPythonConsole
IPythonConsole.molSize = (200, 150)
smarts_linear = Chem.MolFromSmarts('[!#1]~*~*~[!#1]')
smarts_branch = Chem.MolFromSmarts('[!#1]~*(~[!#1])~[!#1]')
smarts_linear
smarts_branch
For better testing I will use a molecule with explicit hydrogen.
mol = Chem.AddHs(Chem.MolFromSmiles('C=C(S)C(N)(O)C'))
mol
sub_linear = mol.GetSubstructMatches(smarts_linear)
sub_branch = mol.GetSubstructMatches(smarts_branch)
all_subs = sub_linear + sub_branch
found_subs = []
for n in range(len(all_subs)):
found_subs.append(mol)
Draw.MolsToGridImage(found_subs, molsPerRow=4, highlightAtomLists=all_subs, subImgSize=(200,150))
Upvotes: 3