hopheylalaley
hopheylalaley

Reputation: 69

Creating array of unique objects in Python

Let's suppose I have a program that creates some scheme with lines and points. All lines determine by two points. There are these classes:

class Coordinates(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y


class Point(object):
    def __init__(self, coordinates):
        self.coordinates = coordinates

class Line(object):
    def __init__(self, coordinates_1, coordinates_2):
        self.coordinates_1 = coordinates_1
        self.coordinates_2 = coordinates_2

A scheme takes list of lines and creates a list of unique points.

class Circuit(object):
    def __init__(self, element_list):
        self.line_list = element_list
        self.point_collection = set()
        self.point_collection = self.generate_points()

    def generate_points(self):
        for line in self.line_list:
            coordinates_pair = [line.coordinates_1, line.coordinates_2]
            for coordinates in coordinates_pair:
                self.point_collection.add(Point(coordinates))
        return self.point_collection

What variants are able to make a list or collection of unique objects? How to do it without using sets and sorting, only with loops and conditions? And how to do it simplier?

UPD. Code I attached doesn't work properly. I tried to add hash and eq methods in Point class:

class Point(object):
def __init__(self, coordinates):
    self.coordinates = coordinates

def __hash__(self):
    return 0

def __eq__(self, other):
    return True

Then I try to make a scheme with some lines:

element_list=[]
element_list.append(Line(Coordinates(0,0), Coordinates(10,0)))
element_list.append(Line(Coordinates(10,0), Coordinates(10,20)))

circuit = Circuit(element_list)

print(circuit.point_collection)

Two lines here equal four points, where two points have the same coordinates. Hence, the code must print three objects, but it does only one:

{<__main__.Point object at 0x0083E050>}

Upvotes: 2

Views: 4481

Answers (1)

arekolek
arekolek

Reputation: 9621

Short answer:

You need to implement __hash__() and __eq__() methods in your Point class. For an idea, see this answer showing a correct and good way to implement __hash__().

Long answer:

The documentation says that:

A set object is an unordered collection of distinct hashable objects. Common uses include (...) removing duplicates from a sequence (...)

And hashable means:

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.

Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is derived from their id().

Which explains why your code does not remove duplicate points.

Consider this implementation that makes all instances of Foo distinct and all instances of Bar equal:

class Foo:
  pass


class Bar:
  def __hash__(self):
    return 0

  def __eq__(self, other):
    return True

Now run:

>>> set([Foo(), Foo()])
{<__main__.Foo at 0x7fb140791da0>, <__main__.Foo at 0x7fb140791f60>}

>>> set([Bar(), Bar()])
{<__main__.Bar at 0x7fb1407c5780>}

In your case, __eq__ should return True when both coordinates are equal, while __hash__ should return a hash of the coordinate pair. See the answer mentioned earlier for a good way to do this.

Some remarks:

Your Point class has currently no reason to exist from a design perspective, since it is just a wrapper around Coordinates and offers no additional functionality. You should just use either one of them, for example:

class Point(object):
  def __init__(self, x, y):
    self.x = x
    self.y = y

And why not call coordinates_1 and coordinates_2 just a and b?

class Line(object):
  def __init__(self, a, b):
    self.a = a
    self.b = b

Also, your generate_points could be implemented in a more pythonic way:

def generate_points(self):
  return set(p for l in self.line_list for p in (l.a, l.b))

Finally, for easier debugging, your might consider implementing __repr__ and __str__ methods in your classes.

Upvotes: 6

Related Questions