titto.sebastian
titto.sebastian

Reputation: 531

Faster way to sort a list of objects based on a attribute in Python

I've got a list of object in python which i want to sort based on a attribute.

For eg:

abc is a class with attributes id and count.

I have a list objects of the class abc.

list=[abc('1',120),abc('2',0),abc('0',180),abc('5',150)].

I want to sort the list in ascending order of the attribute 'count'

I have done it using:

list.sort(key=attrgetter('count'))

I have found using profiling my python script that it takes lot of time for sorting.

Can anyone suggest a better and faster way to sort list of object based on a attribute minimizing the time for sorting.

Upvotes: 2

Views: 4905

Answers (3)

matino
matino

Reputation: 17715

I believe the sort method is implemented using Timsort algorithm, so there is not much you can improve in terms of sorting.

What you can do is to insert the elements differently provided you have the control over the inserting part of code.
For example you could use a binary heap to optimize retreiving of the largest element (see heapq module in Python) or binary search tree to maintain the sorting order. The data strcuture you choose, mainly depends on what do you want to do with the elements.

Upvotes: 0

juhist
juhist

Reputation: 4314

A nitpick: you're using the name list for your list which will overwrite the standard list class. You'd be better off using l as the list name.

I tested sorting a list containing 12 times the contents of your list 100000 times. It took 0.848 s without a comparator function or a key when I used the sorted() function to avoid re-sorting an already sorted list.

There are at least three ways I can think of:

A. Use the sort() with the comparator function:

def comparator(x, y):
  return cmp(x.count, y.count)
l.sort(cmp=comparator)

This took 9.598 s on my system when I used the sorted() function to avoid re-sorting an already sorted list.

B. Use the sort() with the key function:

l.sort(key=operator.attrgetter('count'))

This took 3.111 s on my system when I used the sorted() function to avoid re-sorting an already sorted list.

C. Use native C code to improve the performance of the sorting. I didn't test this.

So, you seem to be already using the fastest all-Python way there is and the way forward would be the use of native C code.

Upvotes: 2

Fomalhaut
Fomalhaut

Reputation: 9745

If I understand you correctly:

class Abc(object):
    def __init__(self, name, count):
        self.name = name
        self.count = count

    @classmethod
    def sort_key(cls, key):
        if key == 'count':
            return lambda obj: obj.count
        elif key == 'name':
            return lambda obj: obj.name


lst = [Abc('1', 120), Abc('2', 0), Abc('0', 180), Abc('5', 150)]

lst.sort(key=Abc.sort_key('count'))
for e in lst:
    print e.name, e.count
print

lst.sort(key=Abc.sort_key('name'))
for e in lst:
    print e.name, e.count
print

I'd not recommend you use 'id', 'abc' and 'list' as names of arbitrary variables because they are keywords in python.

Upvotes: 0

Related Questions