Reputation: 19164
I am working on .smiles files. File structure of .smiles file is : http://en.wikipedia.org/wiki/Chemical_file_format#SMILES
I want to get all the atoms from the smiles file. It means that If there is single 'C' atom it means that there will be 4 'H' atoms will be connected to them.
I found while searching that there are some modules in python which can parse the smiles format but they do not give the supported hydrogen atoms. (for example : they only give 'C' and not other 4 'H' atoms connected to that 'C' atom)
How can I find all the atoms including the connected 'H' atoms as well using python.
Example of smiles file which needs to be converted in to all atoms including connected 'H' atoms:
[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H]
Thank you in advance.
Upvotes: 3
Views: 3840
Reputation: 839
RDKIT is a well defined cheminformatics library in python.
To read a molecule from smiles,
from rdkit import Chem
m = Chem.MolFromSmiles('[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H]')
After you read in the smiles into an RDKIT molecule you can pretty much do everything. documentation - http://www.rdkit.org
Upvotes: 0
Reputation: 2325
For the molecular weight of a compound, given as SMILES, the Python bindings of Openbabel should work:
import pybel
mol = pybel.readfile("smi", "stuff.smi").next()
print mol.molwt
Upvotes: 3
Reputation: 19406
See Open Babel.
Useful Links on Open Babel Site
See Also,
This blog (by Casper Steinmann) on Chemistry with Python (using Open Babel, not all though)
Update See this code(untested):
mymol = pybel.readstring("smi",
"[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)" + \
"N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H")
print mymol.addh()
Upvotes: 6
Reputation: 22827
Try frowns, a chemoinformatics toolkit geared toward rapid development of chemistry related algorithms. It is written in almost 100% Python with a small portion written in C++.
Upvotes: 2
Reputation: 1298
I want to get all the atoms from the smiles file. It means that If there is single 'C' atom it means that there will be 4 'H' atoms will be connected to them. This assumption is not correct, it can be 1,2,3 hydrogens.
Try, openbabel, CDK or similar library for cheminformatics.
But, why do you need all atoms from the file?
Upvotes: 3