Reputation: 367
I am trying to learn python by making a simple program which generates a typical type of practice problem, organic chemistry students usually face on exams: the retro-synthesis question.
For those unfamiliar with this type of question: the student is given the initial and final species of a series of chemical reactions, then is asked to determine which reagents/reactions were performed to the initial reactant to obtain the final product.
Sometimes you are only given the final product and asked to list the reactions necessary to synthesize given some parameters (start only with a compound that has 5 carbons or less, only use alcohol, etc.)
So far, I've done some research, and I think RDkit w/Python is a good place to start. My plan is to use the SMILE format for reading molecules (since I can manipulate it as I would a string), then define functions for each reaction, finally I'll need a database of chemical species which the program can randomly select species from (for the inital and final species in the problem). The program then selects a random species from the database, applies a bunch of reactions to it (3-5, specified by the user) then displays the final product. The user then solves the question himself, and the program then shows the path it took (using images of the intermediates and printing the reagents used to obtain them). Simple. In principle.
But once I started actually coding the functions I ran in to some problems, first of all it is very tedious to write a function for every single reaction, second while SMILE can handle virtually all molecular complications thrown at it (stereo-chemistry, geometry, etc.) it has multiple forms for certain molecules and I'm having trouble keeping the reactions specific. Third, I'm using the "replace" method to manipulate the SMILE strings and this gets me into trouble when I have regiospecific reactions that I want to make universal
For example: Sn2 reactions react well with primary alkyl halides, but not all with tertiary ones (steric hinderance), how would I create a function for this reaction?
Another problem, I want the reactions to be tagged by their respective reagents, thus I've taken to naming the functions by the reagents used. But, this becomes problematic when there are reagents which can take many different forms (Gringard reagents for example).
I feel like there is a better, less repetitive and tedious way to tackle this thing. Looking for a nudge in the right direction
Upvotes: 5
Views: 3338
Reputation: 485
It might be helpful if you will look for a free or if possible with you a commercial software(written in python) which solves the same or a problem close to it, learn its functionality, problem solving approach and if possible obtain its source code. I find this to be helpful in many ways.
Upvotes: 1
Reputation: 2335
That's a pretty ambitious task and you're not the first one who undertook it. Prominent examples were/are
LHASA, originally developed in the group of E.J. Corey at Harvard University
WODCA, developed in the group of J. Gasteiger at Erlangen University
CHIRON, developed in the group of S. Hanessian at the University of Montreal
These projects have seen some man decades of development, but I do not have any reliable information on their current state.
Upvotes: 8