Reputation: 11883
I am using rdkit a cheminformatics toolkit which provides a postgresql cartridge to allow the storage of Chemistry molecules. I want to create a django model as follows:
from rdkit.Chem import Mol
class compound(models.Model):
internal = models.CharField(max_length=10 ,db_index=True)
external = models.CharField(max_length=15,db_index=True)
smiles = models.TextField()
# This is my proposed custom "mol" type defined by rdkit cartridge and that probably maps
# to the Mol object imported from rdkit.Chem
rdkit_mol = models.MyCustomMolField()
So the "rdkit_mol" I want to map to the rdkit postgres database catridge type "mol". In SQL the "mol" column is created from the "smiles" string using syntax like
postgres@compounds=# insert into compound (smiles,rdkit_mol,internal,external) VALUES ('C1=CC=C[N]1',mol_from_smiles('C1=CC=C[N]1'), 'MYID-111111', 'E-2222222');
These call the "mol_from_smiles" database function defined by the cartridge to create the mol object.
Should I have the database take care of this column creation during save. I could them define a custom TRIGGER in postgres that runs the mol_from_smiles function to populate the rdkit_mol column.
I also want to be able to execute queries using the mol custom features that return django models. For example one of the SQL queries could me return me compound models that look like this one chemically. Currently in SQL I do
select * from compound where rdkit_mol @> 'C1=CC=C[N]1';
This then essentially returns the chemical "compound" objects.
My questions are : given the custom nature of my field . Is there a way to mix and match the features of the database "mol" type with the django compound model? What are ways to achieve this.
Currently I am leaning towards not using the Django ORM and just use raw SQL to backtrip to and from the database. I want to find out if there is a django way of working with such custom types.
In my current hybrid approach my views would look like this.
def get_similar_compounds(request):
# code to get the raw smiles string for eg 'C1=CC=C[N]1' from a form
db_cursor.execute("select internal from compound where rdkit_mol @> 'C1=CC=C[N]1';")
# code to get internal ids from database cursor
similar_compounds = compound.objects.filter(internal__in = ids_from_query_above)
# Then process queryset
Is this hybrid method advisable or is there a more pythonic/django way of dealing with this custom data type.
Upvotes: 3
Views: 2098
Reputation: 11883
My question mainly had to deal with the mechanics of creating a django custom field to handle the "mol" data-type defined by the postgres rdkit data cartridge.
The solution I worked out comprised a Custom Field that would co-exist with my model and then use raw SQL to run queries against the mol type.
Since everytime a SMILES containing model instance was instantiated , I needed to create an rdkit "mol" type I created a database procedure and a trigger that was triggered upon table insert or update.
# A south migration that defines a function called write_rdkit_mol_south in PL/PGSQL
from south.utils import datetime_utils as datetime
from south.db import db
from south.v2 import DataMigration
from django.db import models
class Migration(DataMigration):
def forwards(self, orm):
"Write your forwards methods here."
db.execute("""create function write_rdkit_mol_south() RETURNS trigger as $write_rdkit_mol_south$
BEGIN
NEW.rdkit_mol := mol_from_smiles(NEW.smiles::cstring);
RETURN NEW;
END;
$write_rdkit_mol_south$ LANGUAGE plpgsql;""")
db.execute(
"create TRIGGER write_rdkit_mol_trig BEFORE INSERT OR UPDATE on strucinfo_compound FOR EACH ROW EXECUTE PROCEDURE write_rdkit_mol_south();")
# Note: Don't use "from appname.models import ModelName".
# Use orm.ModelName to refer to models in this application,
# and orm['appname.ModelName'] for models in other applications.
def backwards(self, orm):
"Write your backwards methods here."
db.execute("drop TRIGGER write_rdkit_mol_trig ON strucinfo_compound;")
db.execute("DROP FUNCTION write_rdkit_mol_south();")
Next I created the custom field and the model.
# My Django model:
class compound(models.Model):
internalid = models.CharField(max_length=10 ,db_index=True)
externalid = models.CharField(max_length=15,db_index=True)
smiles = models.TextField()
rdkit_mol = RdkitMolField()
def save(self,*args,**kwargs):
self.rdkit_mol = ""
super(compound,self).save(*args,**kwargs)
# The custom field
class RdkitMolField(models.Field):
description = "Rdkit molecule field"
def __init__(self,*args,**kwds):
super(RdkitMolField,self).__init__(*args,**kwds)
def db_type(self, connection):
if connection.settings_dict['ENGINE'] == 'django.db.backends.postgresql_psycopg2':
return None
else:
raise DatabaseError('Field type only supported for Postgres with rdkit cartridge')
def to_python(self, value):
if isinstance(value,Chem.Mol):
return value
if isinstance(value,basestring):
# The database normally returns the smiles string
return Chem.MolFromSmiles(str(value))
else:
if value:
#if mol_send was used then we will have a pickled object
return Chem.Mol(str(value))
else:
# The None Case
return "NO MOL"
def get_prep_value(self, value):
# This gets called during save
# the method should return data in a format that has been prepared for use as a parameter in a query : say the docs
# rdkit_mol queries do accept smiles strings
if isinstance(value,basestring):
db_smiles = str(value)
if db_smiles:
my_mol = Chem.MolFromSmiles(db_smiles)
else:
return None
if my_mol:
# Roundtrip with object could be avoided
return str(Chem.MolToSmiles(my_mol))
elif isinstance(value,(str,unicode)):
valid_smiles = str(Chem.MolToSmiles(Chem.MolFromSmiles(str(value))))
if valid_smiles:
return valid_smiles
else:
# This is the None case
# The database trigger will handle this as this should happen only during insert or update
return None
def validate(self, value, model_instance):
# This field is handled by database trigger so we do not want it to be used for object initiation
if value is None:
return
else:
super(RdkitMolField,self).validate(value,model_instance)
Upvotes: 1
Reputation: 16226
The way to mix that is to provide custom field implementation - what you are already doing. There is not much more to it.
Custom fields have quite extensive protocol for customizing their behavior. You can customize what happens before value is sent to database, what happens when it is received, what happens when particular lookup (e.g. mol__in=sth
) is used.
In the current development version Django allows providing custom lookup types, so you could even implement @>
operator (though I recommend sticking with official stable version).
In the end it depends on what is easier for you. Providing good, coherent implementation of MolField
can prove time consuming. Therefore it really depends in how many places you need it. It could be more pragmatic to just use raw SQL in those few places.
Upvotes: 1