fyngyrz
fyngyrz

Reputation: 2658

Python3 style sorting -- old cmp method functionality in new key mechanism?

I read about the wrapper function to move a cmp style comparison into a key style comparison in Python 3, where the cmp capability was removed.

I'm having a heck of a time wrapping my head around how a Python3 straight key style sorted() function, with, at least as I understand it, just one item specified for the key, can allow you to properly compare, for instance, two IPs for ordering. Or ham calls.

Whereas with cmp there was nothing to it: sorted() and sort() called you with the two ips, you looked at the appropriate portions, made your decisions, done.

def ipCompare(dqA,dqB):
    ...

ipList = sorted(ipList,cmp=ipCompare)

Same thing with ham radio calls. The sorting isn't alphabetic; the calls are generally letter(s)+number(s)+letter(s); the first sorting priority is the number portion, then the first letter(s), then the last letter(s.)

Using cmp... no sweat.

def callCompare(callA,callB):
    ...

hamlist = sorted(hamlist,cmp=callCompare)

With Python3... without going through the hoop jumping of the wrapper... and being passed one item... I think... how can that be done?

And if the wrapper is absolutely required... then why remove cmp within Python3 in the first place?

I'm sure I'm missing something. I just can't see it. :/


ok, now I know what I was missing. Solutions for IPs were given in the answers below. Here's a key I came up with for sorting ham calls of the common prefix, region, postfix form:

import re

def callKey(txtCall):
    l = re.split('([0-9]*)',txtCall.upper(),1)
    return l[1],l[0],l[2]

hamList = ['N4EJI','W1AW','AA7AS','na1a']

sortedHamList = sorted(hamList,key=callKey)

sortedHamList result is ['na1a','W1AW','N4EJI','AA7AS']

Detail:

Upvotes: 3

Views: 2146

Answers (3)

markroxor
markroxor

Reputation: 6476

According to official docs - https://docs.python.org/3/howto/sorting.html#the-old-way-using-the-cmp-parameter

When porting code from Python 2.x to 3.x, the situation can arise when you have the user supplying a comparison function and you need to convert that to a key function. The following wrapper makes that easy to do:

def cmp_to_key(mycmp):
    'Convert a cmp= function into a key= function'
    class K:
        def __init__(self, obj, *args):
            self.obj = obj
        def __lt__(self, other):
            return mycmp(self.obj, other.obj) < 0
        def __gt__(self, other):
            return mycmp(self.obj, other.obj) > 0
        def __eq__(self, other):
            return mycmp(self.obj, other.obj) == 0
        def __le__(self, other):
            return mycmp(self.obj, other.obj) <= 0
        def __ge__(self, other):
            return mycmp(self.obj, other.obj) >= 0
        def __ne__(self, other):
            return mycmp(self.obj, other.obj) != 0
    return K

To convert to a key function, just wrap the old comparison function:

>>> def reverse_numeric(x, y):
...     return y - x

>>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric))
[5, 4, 3, 2, 1]

Upvotes: 1

abarnert
abarnert

Reputation: 365697

First, if you haven't read the Sorting HOWTO, definitely read that; it explains a lot that may not be obvious at first.


For your first example, two IPv4 addresses, the answer is pretty simple.

To compare two addresses, one obvious thing to do is convert them both from dotted-four strings into tuples of 4 ints, then just compare the tuples:

def cmp_ip(ip1, ip2):
    ip1 = map(int, ip1.split('.'))
    ip2 = map(int, ip2.split('.'))
    return cmp(ip1, ip2)

An even better thing to do is convert them to some kind of object that represents an IP address and has comparison operators. In 3.4+, the stdlib has such an object built in; let's pretend 2.7 did as well:

def cmp_ip(ip1, ip2):
    return cmp(ipaddress.ip_address(ip1), ipaddress.ip_address(ip2))

It should be obvious that these are both even easier as key functions:

def key_ip(ip):
    return map(int, ip.split('.'))

def key_ip(ip):
    return ipaddress.ip_address(ip)

For your second example, ham radio callsigns: In order to write a cmp function, you have to be able to break each ham address into the letters, numbers, letters portions, then compare the numbers, then compare the first letters, then compare the second letters. In order to write a key function, you have to be able to break down a ham address into the letters, numbers, letters portions, then return a tuple of (numbers, first letters, second letters). Again the key function is actually easier, not harder.


And really, this is the case for most examples anyone was able to come up with. Most complicated comparisons ultimately come down to a complicated conversion into some sequence of parts, and then simple lexicographical comparison of that sequence.

That's why cmp functions were deprecated way back in 2.4 and finally removed in 3.0.


Of course there are some cases where a cmp function is easier to read—most of the examples people try to come up with turn out to be wrong, but there are some. And there's also code which has been working for 20 years and nobody wants to rethink it in new terms for no benefit. For those cases, you've got cmp_to_key.


There's actually another reason cmp was deprecated, on top of this one, and maybe a third.

In Python 2.3, types had a __cmp__ method, which was used for handling all of the operators. In 2.4, they grew the six methods __lt__, __eq__, etc. as a replacement. This allows for more flexibility—e.g., you can have types that aren't total-ordered. So, 2.3's when compared a < b, it was actually doing a.__cmp__(b) < 0, which maps in a pretty obvious way to a cmp argument. But in 2.4+, a < b does a.__lt__(b), which doesn't. This confused a lot of people over the years, and removing both __cmp__ and the cmp argument to sort functions removed that confusion.

Meanwhile, if you read the Sorting HOWTO, you'll notice that before we had cmp, the only way to do this kind of thing was decorate-sort-undecorate (DSU). Notice that it's blindly obvious how to map a good key function to a good DSU sort and vice-versa, but it's definitely not obvious with a cmp function. I don't remember anyone explicitly mentioning this one on the py3k list, but I suspect people may have had it in their heads when deciding whether to finally kill cmp for good.

Upvotes: 6

user2864740
user2864740

Reputation: 61875

To use the new key argument, simply decompose the comparison to another object that already implements a well-ordered comparison, such as to a tuple or list (eg. a sequence of integers). These types work well because they are sequence-wise ordered.

def ip_as_components (ip):
    return map(int, ip.split('.'))

sorted_ips = sorted(ips, key=ip_as_components)

The ordering of the each of the components are the same as the individual tests as found in a traditional compare-and-then-compare-by function.

Looking at the HAM ordering it may look like:

def ham_to_components (ham_code):
    # .. decompose components based on ordering of each
    return (prefix_letters, numbers, postfix_letters)

The key approach (similar to "order by" found in other languages) is generally a simpler and more natural construct to deal with - assuming that the original types are not already well-ordered. The main drawback with this approach is that partially reversed (eg. asc then desc) ordering can be tricky, but that is solvable by returning nested tuples etc.

In Py3.0, the cmp parameter was removed entirely (as part of a larger effort to simplify and unify the language, eliminating the conflict between rich comparisons and the __cmp__() magic method).

If absolutely needing sorted with a custom "cmp", cmp_to_key can be trivially used.

sorted_ips = sorted(ips, key=functools.cmp_to_key(ip_compare))

Upvotes: 4

Related Questions