sam
sam

Reputation: 19204

to get molecular name from smiles format using python

I have number of molecules in smiles format and I want to get molecular name from smiles format of molecule and I want to use python for that conversion.

for example :

CN1CCC[C@H]1c2cccnc2 - Nicotine  
OCCc1c(C)[n+](=cs1)Cc2cnc(C)nc(N)2 - Thiamin

which python module will help me in doing such conversions?
Kindly let me know.

Upvotes: 4

Views: 1990

Answers (4)

Thigh Master 3000
Thigh Master 3000

Reputation: 37

Download the RDkit module and use something like this:

ms_smis = [["CN1CCC[C@H]1c2cccnc2", "Nicotine"],
           ["OCCc1c(C)[n+](=cs1)Cc2cnc(C)nc(N)2", "Thiamin"]]
ms = [[Chem.MolFromSmiles(x[0]), x[1]] for x in ms_smis]

for m in ms: Draw.MolToFile(m[0], m[1] + ".png", size=(800, 800))

Here is the documentation: https://www.rdkit.org/docs/GettingStartedInPython.html

Upvotes: -1

Ps98
Ps98

Reputation: 21

Reference: NCI/CADD from urllib.request import urlopen

def CIRconvert(smi):
    try:
        url ="https://cactus.nci.nih.gov/chemical/structure/" + smi+"/iupac_name" 
        ans = urlopen(url).read().decode('utf8')
        return ans
    except:
        return 'Name Not Available'

smiles  = 'CCCCC(C)CC'
print(smiles, CIRconvert(smiles))

Output: CCCCC(C)CC - 3-Methylheptane

Upvotes: 1

Tim
Tim

Reputation: 2212

There is a section in the open babel documentation on similarity searching you may want to look at, you could combine this with a sdl file derived from Chembl.

I will give this a go later as it way be much more fruitful than my previous answer!

Upvotes: 1

Tim
Tim

Reputation: 2212

I don't know of any one module that will let you do this, I had to play at data wrangler to try to get a satisfactory answer.

I tackled this using Wikipedia which is being used more and more for structured bioinformatics / chemoinformatics data, but as it turned out my program reveals that a lot of that data is incorrect.

I used urllib to submit a SPARQL query to dbpedia, first searching for the smiles string and failing that searching for the molecular weight of the compound.

import sys
import urllib
import urllib2
import traceback
import pybel
import json

def query(q,epr,f='application/json'):
    try:
        params = {'query': q}
        params = urllib.urlencode(params)
        opener = urllib2.build_opener(urllib2.HTTPHandler)
        request = urllib2.Request(epr+'?'+params)
        request.add_header('Accept', f)
        request.get_method = lambda: 'GET'
        url = opener.open(request)
        return url.read()
    except Exception, e:
        traceback.print_exc(file=sys.stdout)
        raise e 

url = 'http://dbpedia.org/sparql'

q1 = '''
select ?name where {
    ?s <http://dbpedia.org/property/smiles> "%s"@en.
    ?s rdfs:label ?name.
    FILTER(LANG(?name) = "" || LANGMATCHES(LANG(?name), "en"))
}
limit 10
'''
q2 = '''
select ?name where {
    ?s <http://dbpedia.org/property/molecularWeight> '%s'^^xsd:double.
    ?s rdfs:label ?name.
    FILTER(LANG(?name) = "" || LANGMATCHES(LANG(?name), "en"))
}
limit 10
'''

smiles = filter(None, '''

CN1CCC[C@H]1c2cccnc2
CN(CCC1)[C@@H]1C2=CC=CN=C2

OCCc1c(C)[n+](=cs1)Cc2cnc(C)nc(N)2

Cc1nnc2CN=C(c3ccccc3)c4cc(Cl)ccc4-n12

CN1C(=O)CN=C(c2ccccc2)c3cc(Cl)ccc13

CCc1nn(C)c2c(=O)[nH]c(nc12)c3cc(ccc3OCC)S(=O)(=O)N4CCN(C)CC4

CC(C)(N)Cc1ccccc1

CN(C)C(=O)Cc1c(nc2ccc(C)cn12)c3ccc(C)cc3

COc1ccc2[nH]c(nc2c1)S(=O)Cc3ncc(C)c(OC)c3C

CCN(CC)C(=O)[C@H]1CN(C)[C@@H]2Cc3c[nH]c4cccc(C2=C1)c34
'''.splitlines())

OBMolecules = {}
for smile in smiles:
    try:
        OBMolecules[smile] = pybel.readstring('smi', smile)
    except Exception as e:
        print e

for smi in smiles:
    print '--------------'
    print smi
    try:
        print "searching by smiles string.."
        results = json.loads(query(q1 % smi, url))
        if len(results['results']['bindings']) == 0:
            raise Exception('no results from smiles')
        else:
            print 'NAME: ', results['results']['bindings'][0]['name']['value']

    except Exception as e:
        print e

        try:
            mol_weight = round(OBMolecules[smi].molwt, 2)
            print "search ing by molecular weight %s" % mol_weight
            results = json.loads(query(q2 % mol_weight, url))
            if len(results['results']['bindings']) == 0:
                raise Exception('no results from molecular weight')
            else:
                print 'NAME: ', results['results']['bindings'][0]['name']['value']
        except Exception as e:
            print e

output...

--------------
CN1CCC[C@H]1c2cccnc2
searching by smiles string..
no results from smiles
search ing by molecular weight 162.23
NAME:  Anabasine
--------------
CN(CCC1)[C@@H]1C2=CC=CN=C2
searching by smiles string..
no results from smiles
search ing by molecular weight 162.23
NAME:  Anabasine
--------------
OCCc1c(C)[n+](=cs1)Cc2cnc(C)nc(N)2
searching by smiles string..
no results from smiles
search ing by molecular weight 267.37
NAME:  Pipradrol
--------------
Cc1nnc2CN=C(c3ccccc3)c4cc(Cl)ccc4-n12
searching by smiles string..
no results from smiles
search ing by molecular weight 308.76
no results from molecular weight
--------------
CN1C(=O)CN=C(c2ccccc2)c3cc(Cl)ccc13
searching by smiles string..
no results from smiles
search ing by molecular weight 284.74
NAME:  Mazindol
--------------
CCc1nn(C)c2c(=O)[nH]c(nc12)c3cc(ccc3OCC)S(=O)(=O)N4CCN(C)CC4
searching by smiles string..
no results from smiles
search ing by molecular weight 460.55
no results from molecular weight
--------------
CC(C)(N)Cc1ccccc1
searching by smiles string..
no results from smiles
search ing by molecular weight 149.23
NAME:  Phenpromethamine
--------------
CN(C)C(=O)Cc1c(nc2ccc(C)cn12)c3ccc(C)cc3
searching by smiles string..
no results from smiles
search ing by molecular weight 307.39
NAME:  Talastine
--------------
COc1ccc2[nH]c(nc2c1)S(=O)Cc3ncc(C)c(OC)c3C
searching by smiles string..
no results from smiles
search ing by molecular weight 345.42
no results from molecular weight
--------------
CCN(CC)C(=O)[C@H]1CN(C)[C@@H]2Cc3c[nH]c4cccc(C2=C1)c34
searching by smiles string..
no results from smiles
search ing by molecular weight 323.43
NAME:  Lysergic acid diethylamide

As you can see the first two results which should be nicotine come out wrong, this is because the wikipedia entry for nicotine reports the molecular mass in the molecular weight field.

Upvotes: 1

Related Questions