Reputation: 58
I am working with RDKIT and am using an algorithm to randomly generate Morgan fingerprints all 2048 bits. I am wondering if there’s a way to trace back the fingerprint to somehow figure out what molecule it is, whether it’s a smiles string, name, etc. Thanks!
Upvotes: 2
Views: 4011
Reputation: 140
No these fingerprints cannot be converted to molecules, information about the number and position of the 'structures' (of the 1-bits) are missing in these fingerprints. It is only possible to convert the 1-bits (bits which are 1 in the Morgan fingerprint) to structures by:
# Draw all real 1-bits
tpls = [(m,x,bi) for x in fp.GetOnBits()]
Draw.DrawMorganBits(tpls,molsPerRow=3, subImgSize=(400,400), legends=[str(x) for x in `fp.GetOnBits()])`
As output you get the drawings of all 1-bits:
Upvotes: 1
Reputation: 5233
To my knowledge there's no way to recover a chemical structure from a fingerprint. Fingerprints map all chemical structures to a fixed bit length, which implies bit collisions.
Furthermore, fingerprints only track the presence or absence of different substructures. Fingerprints don't tell you how many times a substructure is present, or how substructures are connected. So the fingerprint doesn't give you the information to reconstruct the initial molecule from the substructures.
You can use RDKit to see what substructures correspond with different bits in the fingerprint (see here).
My suggestion would be to create a class that holds both the SMILES string and the corresponding fingerprint so that information stays together
Upvotes: 0
Reputation: 775
A couple of points on this:
Morgan fingerprints are not a unique representation of a molecule. Due to bit-collisions many molecules can theoretically produce the same fingerprint.
However, Morgan fingerprints with 2048 bits are quite sparse and so the chances of collision are reduced. A notable exception would be polymers (repeating units cause the same bits to be set, so a trimer and a dimer would look identical in terms of their Morgan fingerprints)
If you just want to discover a solution (not all solutions), there are many ways to reverse engineer a fingerprint. See discussion on the RDKit mailing list. And another similar discussion here (not reverse engineering Morgan, but a different ambiguous molecular representation)
Upvotes: 3