sam
sam

Reputation: 19164

Retrieve all molecules from smiles file

I am working on .smiles files. File structure of .smiles file is : http://en.wikipedia.org/wiki/Chemical_file_format#SMILES

I want to get all the atoms from the smiles file. It means that If there is single 'C' atom it means that there will be 4 'H' atoms will be connected to them.

I found while searching that there are some modules in python which can parse the smiles format but they do not give the supported hydrogen atoms. (for example : they only give 'C' and not other 4 'H' atoms connected to that 'C' atom)

How can I find all the atoms including the connected 'H' atoms as well using python.
Example of smiles file which needs to be converted in to all atoms including connected 'H' atoms:

[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H]

Thank you in advance.

Upvotes: 3

Views: 3840

Answers (5)

Jayaram
Jayaram

Reputation: 839

RDKIT is a well defined cheminformatics library in python.

To read a molecule from smiles,

from rdkit import Chem

m = Chem.MolFromSmiles('[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H]')

After you read in the smiles into an RDKIT molecule you can pretty much do everything. documentation - http://www.rdkit.org

Upvotes: 0

Klaus-Dieter Warzecha
Klaus-Dieter Warzecha

Reputation: 2325

For the molecular weight of a compound, given as SMILES, the Python bindings of Openbabel should work:

import pybel
mol = pybel.readfile("smi", "stuff.smi").next()
print mol.molwt

Upvotes: 3

pradyunsg
pradyunsg

Reputation: 19406

See Open Babel.

Useful Links on Open Babel Site

See Also,
This blog (by Casper Steinmann) on Chemistry with Python (using Open Babel, not all though)

Update See this code(untested):

mymol = pybel.readstring("smi",  
"[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)" + \
"N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H")
print mymol.addh()

Upvotes: 6

BioGeek
BioGeek

Reputation: 22827

Try frowns, a chemoinformatics toolkit geared toward rapid development of chemistry related algorithms. It is written in almost 100% Python with a small portion written in C++.

Upvotes: 2

chupvl
chupvl

Reputation: 1298

I want to get all the atoms from the smiles file. It means that If there is single 'C' atom it means that there will be 4 'H' atoms will be connected to them. This assumption is not correct, it can be 1,2,3 hydrogens.

Try, openbabel, CDK or similar library for cheminformatics.

But, why do you need all atoms from the file?

Upvotes: 3

Related Questions