Python - set somehow getting duplicate data

Question

I have an class definition with a __hash__ function that uses the object properties to create a unique key for comparison in python sets.

The hash method looks like this:

def __hash__(self):
return int('%d%s'%(self.id,self.create_key))

In a module responsible for implementing this class, several queries are run that could conceivably construct duplicate instances of this class, and the queue that is created in the function responsible for doing this is a represented as a set to make sure the the dupes can be omitted:

in_set = set()
  out_set = set()
  for inid in inids:
    ps = Perceptron.getwherelinked(inid,self.in_ents)

for p in ps:
  in_set.add(p)


  for poolid in poolids:
  ps = Perceptron.getwherelinked(poolid,self.out_ents)
  for p in ps:
    out_set.add(p)
  return in_set.union(out_set)

(Not sure why the indenting got mangled here)

Somehow, despite calling the union method, I am still getting the two duplicate instances. When printed out (with a str method in the Perceptron class that just calls hash) the two hashes are identical, which theoretically shouldn't be possible.

set([1630, 1630])

Any guidance would be appreciated.

Ignacio Vazquez-Abrams · Accepted Answer

If a class does not define a __cmp__() or __eq__() method it should not define a __hash__() operation either

source

Define __eq__().

Python - set somehow getting duplicate data

Answers (2)

Related Questions