Reputation: 19204
I have number of molecules in smiles format and I want to get molecular name from smiles format of molecule and I want to use python for that conversion.
for example :
CN1CCC[C@H]1c2cccnc2 - Nicotine
OCCc1c(C)[n+](=cs1)Cc2cnc(C)nc(N)2 - Thiamin
which python module will help me in doing such conversions?
Kindly let me know.
Upvotes: 4
Views: 1990
Reputation: 37
Download the RDkit module and use something like this:
ms_smis = [["CN1CCC[C@H]1c2cccnc2", "Nicotine"],
["OCCc1c(C)[n+](=cs1)Cc2cnc(C)nc(N)2", "Thiamin"]]
ms = [[Chem.MolFromSmiles(x[0]), x[1]] for x in ms_smis]
for m in ms: Draw.MolToFile(m[0], m[1] + ".png", size=(800, 800))
Here is the documentation: https://www.rdkit.org/docs/GettingStartedInPython.html
Upvotes: -1
Reputation: 21
Reference: NCI/CADD from urllib.request import urlopen
def CIRconvert(smi):
try:
url ="https://cactus.nci.nih.gov/chemical/structure/" + smi+"/iupac_name"
ans = urlopen(url).read().decode('utf8')
return ans
except:
return 'Name Not Available'
smiles = 'CCCCC(C)CC'
print(smiles, CIRconvert(smiles))
Output: CCCCC(C)CC - 3-Methylheptane
Upvotes: 1
Reputation: 2212
There is a section in the open babel documentation on similarity searching you may want to look at, you could combine this with a sdl file derived from Chembl.
I will give this a go later as it way be much more fruitful than my previous answer!
Upvotes: 1
Reputation: 2212
I don't know of any one module that will let you do this, I had to play at data wrangler to try to get a satisfactory answer.
I tackled this using Wikipedia which is being used more and more for structured bioinformatics / chemoinformatics data, but as it turned out my program reveals that a lot of that data is incorrect.
I used urllib to submit a SPARQL query to dbpedia, first searching for the smiles string and failing that searching for the molecular weight of the compound.
import sys
import urllib
import urllib2
import traceback
import pybel
import json
def query(q,epr,f='application/json'):
try:
params = {'query': q}
params = urllib.urlencode(params)
opener = urllib2.build_opener(urllib2.HTTPHandler)
request = urllib2.Request(epr+'?'+params)
request.add_header('Accept', f)
request.get_method = lambda: 'GET'
url = opener.open(request)
return url.read()
except Exception, e:
traceback.print_exc(file=sys.stdout)
raise e
url = 'http://dbpedia.org/sparql'
q1 = '''
select ?name where {
?s <http://dbpedia.org/property/smiles> "%s"@en.
?s rdfs:label ?name.
FILTER(LANG(?name) = "" || LANGMATCHES(LANG(?name), "en"))
}
limit 10
'''
q2 = '''
select ?name where {
?s <http://dbpedia.org/property/molecularWeight> '%s'^^xsd:double.
?s rdfs:label ?name.
FILTER(LANG(?name) = "" || LANGMATCHES(LANG(?name), "en"))
}
limit 10
'''
smiles = filter(None, '''
CN1CCC[C@H]1c2cccnc2
CN(CCC1)[C@@H]1C2=CC=CN=C2
OCCc1c(C)[n+](=cs1)Cc2cnc(C)nc(N)2
Cc1nnc2CN=C(c3ccccc3)c4cc(Cl)ccc4-n12
CN1C(=O)CN=C(c2ccccc2)c3cc(Cl)ccc13
CCc1nn(C)c2c(=O)[nH]c(nc12)c3cc(ccc3OCC)S(=O)(=O)N4CCN(C)CC4
CC(C)(N)Cc1ccccc1
CN(C)C(=O)Cc1c(nc2ccc(C)cn12)c3ccc(C)cc3
COc1ccc2[nH]c(nc2c1)S(=O)Cc3ncc(C)c(OC)c3C
CCN(CC)C(=O)[C@H]1CN(C)[C@@H]2Cc3c[nH]c4cccc(C2=C1)c34
'''.splitlines())
OBMolecules = {}
for smile in smiles:
try:
OBMolecules[smile] = pybel.readstring('smi', smile)
except Exception as e:
print e
for smi in smiles:
print '--------------'
print smi
try:
print "searching by smiles string.."
results = json.loads(query(q1 % smi, url))
if len(results['results']['bindings']) == 0:
raise Exception('no results from smiles')
else:
print 'NAME: ', results['results']['bindings'][0]['name']['value']
except Exception as e:
print e
try:
mol_weight = round(OBMolecules[smi].molwt, 2)
print "search ing by molecular weight %s" % mol_weight
results = json.loads(query(q2 % mol_weight, url))
if len(results['results']['bindings']) == 0:
raise Exception('no results from molecular weight')
else:
print 'NAME: ', results['results']['bindings'][0]['name']['value']
except Exception as e:
print e
output...
--------------
CN1CCC[C@H]1c2cccnc2
searching by smiles string..
no results from smiles
search ing by molecular weight 162.23
NAME: Anabasine
--------------
CN(CCC1)[C@@H]1C2=CC=CN=C2
searching by smiles string..
no results from smiles
search ing by molecular weight 162.23
NAME: Anabasine
--------------
OCCc1c(C)[n+](=cs1)Cc2cnc(C)nc(N)2
searching by smiles string..
no results from smiles
search ing by molecular weight 267.37
NAME: Pipradrol
--------------
Cc1nnc2CN=C(c3ccccc3)c4cc(Cl)ccc4-n12
searching by smiles string..
no results from smiles
search ing by molecular weight 308.76
no results from molecular weight
--------------
CN1C(=O)CN=C(c2ccccc2)c3cc(Cl)ccc13
searching by smiles string..
no results from smiles
search ing by molecular weight 284.74
NAME: Mazindol
--------------
CCc1nn(C)c2c(=O)[nH]c(nc12)c3cc(ccc3OCC)S(=O)(=O)N4CCN(C)CC4
searching by smiles string..
no results from smiles
search ing by molecular weight 460.55
no results from molecular weight
--------------
CC(C)(N)Cc1ccccc1
searching by smiles string..
no results from smiles
search ing by molecular weight 149.23
NAME: Phenpromethamine
--------------
CN(C)C(=O)Cc1c(nc2ccc(C)cn12)c3ccc(C)cc3
searching by smiles string..
no results from smiles
search ing by molecular weight 307.39
NAME: Talastine
--------------
COc1ccc2[nH]c(nc2c1)S(=O)Cc3ncc(C)c(OC)c3C
searching by smiles string..
no results from smiles
search ing by molecular weight 345.42
no results from molecular weight
--------------
CCN(CC)C(=O)[C@H]1CN(C)[C@@H]2Cc3c[nH]c4cccc(C2=C1)c34
searching by smiles string..
no results from smiles
search ing by molecular weight 323.43
NAME: Lysergic acid diethylamide
As you can see the first two results which should be nicotine come out wrong, this is because the wikipedia entry for nicotine reports the molecular mass in the molecular weight field.
Upvotes: 1