Chaim Leib Halbert
Chaim Leib Halbert

Reputation: 2334

Python: sorting Intervals with __cmp__, with __lt__ meaning "strictly less than"

I have an intervaltree library that needs to sort Intervals and eventually Points.

I have a __cmp__ method that imposes very stable and logical sorting (see end for the code). And, I have a useful __lt__ method for seeing if intervals are strictly less than each other (ditto).

I have about 30 tests for this behavior, and this is going fine...

The problem

...Except when I have to sort:

>>> sorted([Interval(0, 10), Interval(-10, 5)])
[Interval(0, 10), Interval(-10, 5)]               # WRONG!

Because Python uses __lt__, it uses "strictly less-than" semantics instead of __cmp__. I can force it to do that explicitly, but I'd rather avoid the cumbersome syntax:

>>> sorted([Interval(0, 10), Interval(-10, 5)], cmp=Interval.__cmp__)
[Interval(-10, 5), Interval(0, 10)]               # RIGHT

In Python 3, the syntax is even more cumbersome (not available in Python 2.6, available in Python 2.7).

>>> from functools import cmp_to_key
>>> sorted([Interval(0, 10), Interval(-10, 5)], key=cmp_to_key(Interval.__cmp__))

What I want

Is there a way I can make sorted() and friends use Interval.__cmp__ automatically, but still keep __lt__ as-is? That is, I want this elegant behavior:

>>> sorted([Interval(0, 10), Interval(-10, 5)])   # no extra arguments!
[Interval(-10, 5), Interval(0, 10)]

Appendix: Implementations for __cmp__ and __lt__

def __cmp__(self, other):
    """
    Tells whether other sorts before, after or equal to this
    Interval.

    Sorting is by begins, then by ends, then by data fields.

    If data fields are not both sortable types, data fields are
    compared alphabetically by type name.
    :param other: Interval
    :return: -1, 0, 1
    :rtype: int
    """
    s = self[0:2]
    try:
        o = other[0:2]
    except:
        o = (other,)
    if s != o:
        return -1 if s < o else 1
    try:
        if self.data == other.data:
            return 0
        return -1 if self.data < other.data else 1
    except TypeError:
        s = type(self.data).__name__
        o = type(other.data).__name__
        if s == o:
            return 0
        return -1 if s < o else 1

def __lt__(self, other):
    """
    Less than operator. Returns False if there is an overlap.
    :param other: Interval or point
    :return: True or False
    :rtype: bool
    """
    return not self.overlaps(other) and self.end <= other

PS: This problem is in the development branch, where I want to change __lt__'s behavior to "strictly less-than." In the master branch (release 1.1.0), __lt__ just parrots __cmp__.

Update

Upvotes: 1

Views: 1061

Answers (1)

shx2
shx2

Reputation: 64328

I strongly suggest against defining any of the order operators for your class.

The reason is that existence of the order operators implies an order-relation exists, along with its associated semantics, which your class inherently violates (there is no well-defined order relation on intervals).

For example, an order-relation implies that:

not(a<b) and not(b<a) ==> a==b

This is not true for intervals.

In other words, you shouldn't just decide what you want your operators to mean (e.g. that __lt__ means strict less-than), the semantics is predfined in the languauge for you (e.g. the implementation of sorted relies on that semantics).

What you should do is define non-operator methods for comparing (e.g. def strict_lt(self, other): ...), and use them. I'd also define a sort_intervals function, and use it in your code instead of sorted directly. E.g.:

def sort_intervals(lst):
    return sorted(lst, cmp = ...)

Upvotes: 3

Related Questions