Wychh
Wychh

Reputation: 713

RDKIT: Find Substructure Atom Coordinates

I have imported a molecule as a .mol file into rdkit. The molecule contains a CN=NC substructure. I wish to find the coordinates of the CN=NC substructure.

I have tried using Chem.MolToBlock(molfile) to list the 3D coordinates; however, this returns the 3D coordinates of the entire molecule.

The basis of my code has been:

molecule = rdkit.Chem.MolFromMolFile('molfile')
query = rdkit.Chem.MolFromSmiles('CN=NC')`
subatomids = m.GetSubstructMatch(q)

However, I do not know if there is a simple way to return the coordinates of the specific atoms

The ideal result would:

C = x y z
N = x y z
N = x y z
C = x y z

or something similar.

Upvotes: 1

Views: 1987

Answers (1)

rapelpy
rapelpy

Reputation: 1869

I will use a molblock instead of a .mol file but it works for both. In my sample molblock your substructure are the atoms 2-5.

To get the coordinates you need the conformer of the molecule and with the ID's from the substructure search you can call the elements.

from rdkit import Chem

molblock = '''
cn=nc substructure
sample for stackoverflow
 16 15  0  0  0  0            999 V2000
   -2.6048   -0.8132    0.1394 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.8319    0.4361   -0.2883 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.5126    0.4361    0.3487 N   0  0  0  0  0  0  0  0  0  0  0  0
    0.5126    0.4361   -0.3487 N   0  0  0  0  0  0  0  0  0  0  0  0
    1.8319    0.4361    0.2883 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.6048   -0.8132   -0.1394 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.0542   -1.7032   -0.1653 H   0  0  0  0  0  0  0  0  0  0  0  0
   -2.7245   -0.8132    1.2228 H   0  0  0  0  0  0  0  0  0  0  0  0
   -3.5863   -0.8132   -0.3346 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.7122    0.4361   -1.3717 H   0  0  0  0  0  0  0  0  0  0  0  0
   -2.3825    1.3260    0.0164 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.3825    1.3260   -0.0164 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.7122    0.4360    1.3717 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.0542   -1.7032    0.1653 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.7245   -0.8132   -1.2228 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.5863   -0.8132    0.3346 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  1  7  1  0  0  0  0
  1  8  1  0  0  0  0
  1  9  1  0  0  0  0
  2  3  1  0  0  0  0
  2 10  1  0  0  0  0
  2 11  1  0  0  0  0
  3  4  2  0  0  0  0
  4  5  1  0  0  0  0
  5  6  1  0  0  0  0
  5 12  1  0  0  0  0
  5 13  1  0  0  0  0
  6 14  1  0  0  0  0
  6 15  1  0  0  0  0
  6 16  1  0  0  0  0
M  END'''

m = Chem.MolFromMolBlock(molblock)
#m = Chem.MolFromMolFile('theMolFile')

conf = m.GetConformer()

patt = Chem.MolFromSmiles('CN=NC')

sub = m.GetSubstructMatch(patt)

for s in sub:
    print(m.GetAtoms()[s].GetSymbol(), list(conf.GetAtomPosition(s)))

Output:

C [-1.8319, 0.4361, -0.2883]
N [-0.5126, 0.4361, 0.3487]
N [0.5126, 0.4361, -0.3487]
C [1.8319, 0.4361, 0.2883]

Upvotes: 4

Related Questions