Reputation: 17
A new deep-learning algorithm for drug-discovery based on images, requires splitting a file containing ~3000 chemical compounds in png files containing individual 2D 200 x 200 pixel images (.: SN00001400.png, SN00002805.png, SN00002441.png........). Not need any conformers, nor any other 3D information.
I could send an initial f1.sdf example containing 9 compound images, names and smiles, one for each compound row.
Using rdkit 2017.09.1 in Anaconda3 with Python 3.6, 3.7 or 3.8, Jupyter notebooks and/or Python prompt, in 2 e7 64 computers within Windows 8 professional, I am looking for a simple Python code to split the images, convert them to a 200 x 200 pixel png file (carios), named them by their corresponding compound ID and save them into a different directory (.: images), ready to be tested.
I try many different web codes and combinations but despite intensive testing, they did not work :-(.
Following some of my best (?) code trials.
rdkit imports tested
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import rdMolDraw2D
from rdkit.Chem.Draw.rdMolDraw2D import MolDraw2DSVG
from rdkit.Chem.Draw.rdMolDraw2D import MolDraw2DCairo # cannot import
from rdkit.Chem.Draw import IPythonConsole
from IPython.display import SVG # IPython not in module
from rdkit.Chem import rdDepictor
from rdkit.Chem import MolFromSmiles
Best Test using a unique smiles
IPythonConsole.molSize = (200, 200)
IPythonConsole.ipython_useSVG = True #I would rather use Cairo but I could not make it to work!
mol = Chem.MolFromSmiles('N#Cc1cccc(-c2nc(-c3cccnc3)no2)c1')
display(mol) # not working
AllChem.Compute2DCoords(mol)
I tried different smiles with similar negative results down this line....
IMG_SIZE = 200
smiles="CCCC"
mol = Chem.MolFromSmiles(smiles)
drawer = rdMolDraw2D.MolDraw2DSVG(IMG_SIZE, IMG_SIZE) #MolDraw2D has no attribute MolDraw2DCairo despite cairo being installed!
drawer.drawOptions().bondLineWith = 1
drawer.DrawMolecule(mol) # bad conformer id (?????)
drawer.FinishDrawing()
drawer.WriteDrawingText('comp_id.png')
Best attempts using 9 compounds in f1.sdf
suppl=Chem.SDMolSupplier('f1.sdf')
for mol in suppl:
print(mol.GetName()) # AttributeError: 'Mol' object has no attribute 'GetMolecule_Name'
mols=[x for x in suppl]
Name(mols)
suppl = Chem.SDMolSupplier('f1.sdf')
ms= [x for x in suppl if x is not None]
for m in ms:
tmp=AllChem.Compute2DCoords(m)
Draw.MolToFile(ms[0], 'images/mol1.png') cairo.IOError: error while writing to output stream
Draw.MolToFile(ms[1], 'images/mol2.png')
....................................................................
Hoping to get some help! Thanks for your attention, sincerely Julio
Upvotes: 0
Views: 3479
Reputation: 17
YES !!
IT WORKED BEAUTIFULLY !!!
I will be calling it: Oliver.py
After sleeping, I just waked up with another solution (see below). Perhaps yours is better since it allowed me to define the wide of the lines to be drawn.
I really apreciatted your help!, now I can convert my "gold" files to test the deep-learning model !!!
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
suppl = Chem.SDMolSupplier('f1.sdf')
mols = [x for x in suppl]
x=-1
for m in mols:
x=x+1
nombre=m.GetProp("comp_id")
tmp=AllChem.Compute2DCoords(m)
Draw.MolToFile(mols[x],'images/'+ nombre +'.png', size=(200,200), kekulize = True, wedgeBonds = False,imageType=None, fitImage=False, options=None)
print('ROWS CONVERTED TO IMAGES: ', x)
Upvotes: 0
Reputation: 1783
If the name of your molecules is available in the title line of your SDF file, you can access it as a property with the key '_Name'. Other properties can also be read from the SDF using their corresponding keys. Take the following SDF for example:
CHEMBL1308
3D
Structure written by MMmdl.
12 12 0 0 1 0 999 V2000
-0.0127 0.0114 -0.0000 C 0 0 0 0 0 0
1.4966 0.0081 -0.0000 C 0 0 0 0 0 0
2.3688 -1.0939 0.0000 C 0 0 0 0 0 0
3.6409 -0.7653 0.0000 N 0 0 0 0 0 0
3.6278 0.5682 -0.0000 N 0 0 0 0 0 0
2.3638 1.0896 -0.0000 C 0 0 0 0 0 0
-0.4346 1.0168 0.0000 H 0 0 0 0 0 0
-0.4074 -0.5191 -0.8666 H 0 0 0 0 0 0
-0.4074 -0.5191 0.8666 H 0 0 0 0 0 0
2.0644 -2.1303 0.0000 H 0 0 0 0 0 0
4.4779 1.1136 -0.0000 H 0 0 0 0 0 0
2.2002 2.1571 -0.0000 H 0 0 0 0 0 0
1 2 1 0 0 0
1 7 1 0 0 0
1 8 1 0 0 0
1 9 1 0 0 0
2 3 1 0 0 0
2 6 2 0 0 0
3 4 2 0 0 0
3 10 1 0 0 0
4 5 1 0 0 0
5 6 1 0 0 0
5 11 1 0 0 0
6 12 1 0 0 0
M END
> <SYNONYMS>
Fomepizole (BAN, FDA, INN, USAN)
> <USAN_STEM>
nan
$$$$
The name of the compound (CHEMBL1308) can be accessed like so, assuming mol
is an rdkit molecule:
mol_id = mol.GetProp('_Name')
And the other properties can be accessed like so:
property = mol.GetProp('SYNONYMS')
Thus a simple way to generate the images you need would be like so:
from rdkit.Chem.Draw import rdMolDraw2D
from rdkit.Chem import AllChem
from rdkit import Chem
img_size = (200, 200)
supplier = Chem.SDMolSupplier('mols.sdf')
for mol in supplier:
AllChem.Compute2DCoords(mol)
mol_id = mol.GetProp('_Name')
d = rdMolDraw2D.MolDraw2DCairo(*img_size)
d.DrawMolecule(mol)
d.FinishDrawing()
d.WriteDrawingText(f'images/{mol_id}.png')
obviously, you can adapt this to what you require
Upvotes: 0
Reputation: 17
You were right!.
I performed a "conda install -c conda-forge rdkit" in a newly created Anaconda3 environment, and most of the commands suddenly WORKED!!!. THANK YOU VERY MUCH!!!!
I developed the code below..... but I got stopped because I cannot find a way to transfer each of the corresponding comp_id to the names of the png files that code for the beautiful png images. Any ideas? THANKS!!!
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import rdMolDraw2D
from rdkit.Chem.Draw.rdMolDraw2D import MolDraw2DSVG
from rdkit.Chem.Draw.rdMolDraw2D import MolDraw2DCairo
from rdkit.Chem.Draw import MolToFile
from rdkit.Chem import rdDepictor
from rdkit.Chem import MolFromSmiles
suppl = Chem.SDMolSupplier('f1.sdf')
for mol in suppl:
print(mol.GetProp("comp_id"))
mols= [x for x in suppl]
for m in mols:
tmp=AllChem.Compute2DCoords(m)
Draw.MolToFile(mols[0],'images/3333.png', size=(200,200), kekulize = True, wedgeBonds = False,imageType=None, fitImage=False, options=None) .......#did not get the comp_id but could transfer some attributes
Draw.MolToFile(mols[1], 'images/'+"comp_id"+'a.png')........#did not get the idea
Upvotes: 0