Reputation: 17333
In the PDB
module of Biopython, PDB structures are parsed into Structure
objects, which store the components of the structure in a SMCRA archiecture (Structure/Model/Chain/Residue/Atom). Each level of this hierarchy is represented by an object that inherits the Entity
container class.
My problem is that at no point can any two Entity objects be equal.
Structures built from the same file are not equal:
>>> from Bio import PDB
>>> parser = PDB.PDBParser()
>>> struct1 = parser.get_structure("1hgg", "pdb1hgg.ent")
>>> struct2 = parser.get_structure("1hgg", "pdb1hgg.ent")
>>> struct1 == struct2
False
Residues within that structure are not equal:
>>> first_res1 = struct1.get_residues().next()
>>> first_res2 = struct2.get_residues().next()
>>> first_res1 == first_res2
False
And so on.
If we were to parse the same PDB file separately, at no point could any of the Entity
objects within the structures be equal.
The obvious solution to this problem is to never parse the same PDB file twice. Then, we have object identity and thus, equivalence. However, this answer seems incomplete to me.
Each Entity
object can return an identification tuple with get_full_id()
. This method gives all id's from the top object down; it should be unique for each Entity
within a structure, and unique across all structures if the proper PDB id was supplied when constructing the Structure
object.
My solution for testing Entity
equivalence is merely to compare this full id. That is:
def __eq__(self, other):
return self.get_full_id() == other.get_full_id()
At this point, I'm asking if my implementation of Entity
equivalence is sensible.
__eq__
was left unimplemented within the PDB
module?Upvotes: 1
Views: 244
Reputation: 2629
One common reason for not defining an __eq__
is that it makes things unhashable (so you can't use them as dictionary keys or put them in sets), unless you also define a consistent __hash__
function, and your objects are immutable.
By default __hash__
for objects just uses the ID, which works even for mutable objects, since the ID never changes. But if you define a custom __eq__
, you can't keep hashing by ID, or you'll get a situation where two objects can compare as equal but have different hashes, which is inconsistent with how hashing is supposed to work. So you have to define a custom __hash__
function (which you can do), but then if your object is mutable, you can't/shouldn't really do that, either, so you'll just have an unhashable object. Which may be all right for you.
See more info in the python docs here.
So you can use a custom __eq__
as long as you don't need your objects to be hashable, or if they're immutable; otherwise things get more complicated. Or you could just leave __eq__
alone and name your full ID comparison function something else, so as to not break hashability.
I don't know enough about what PDB IDs mean (in particular, whether false positives are possible) to tell whether your __eq__
implementation is reasonable from that standpoint.
Upvotes: 1