RadaKk
RadaKk

Reputation: 188

Sort tuple list with another list

I have a tuple list to_order such as:

to_order = [(0, 1), (1, 3), (2, 2), (3,2)]

And a list which gives the order to apply to the second element of each tuple of to_order:

order = [2, 1, 3]

So I am looking for a way to get this output:

ordered_list = [(2, 2), (3,2), (0, 1), (1, 3)]

Any ideas?

Upvotes: 12

Views: 2423

Answers (4)

Nir Alfasi
Nir Alfasi

Reputation: 53525

You can provide a key that will check the index (of the second element) in order and sort based on it:

to_order = [(0, 1), (1, 3), (2, 2), (3,2)]
order = [2, 1, 3]
print(sorted(to_order, key=lambda item: order.index(item[1]))) # [(2, 2), (3, 2), (0, 1), (1, 3)]

EDIT

Since, a discussion on time complexities was start... here ya go, the following algorithm runs in O(n+m), using Eric's input example:

N = 5
to_order = [(randrange(N), randrange(N)) for _ in range(10*N)]
order = list(set(pair[1] for pair in to_order))
shuffle(order)


def eric_sort(to_order, order):
    bins = {}

    for pair in to_order:
        bins.setdefault(pair[1], []).append(pair)

    return [pair for i in order for pair in bins[i]]


def alfasin_new_sort(to_order, order):
    arr = [[] for i in range(len(order))]
    d = {k:v for v, k in enumerate(order)}
    for item in to_order:
        arr[d[item[1]]].append(item) 
    return [item for sublist in arr for item in sublist]


from timeit import timeit
print("eric_sort", timeit("eric_sort(to_order, order)", setup=setup, number=1000))
print("alfasin_new_sort", timeit("alfasin_new_sort(to_order, order)", setup=setup, number=1000))

OUTPUT:

eric_sort 59.282021682999584
alfasin_new_sort 44.28244407700004

Upvotes: 21

Eric Duminil
Eric Duminil

Reputation: 54223

Algorithm

You can distribute the tuples in a dict of lists according to the second element and iterate over order indices to get the sorted list:

from collections import defaultdict
to_order = [(0, 1), (1, 3), (2, 2), (3, 2)]
order = [2, 1, 3]

bins = defaultdict(list)

for pair in to_order:
    bins[pair[1]].append(pair)

print(bins)
# defaultdict(<class 'list'>, {1: [(0, 1)], 3: [(1, 3)], 2: [(2, 2), (3, 2)]})

print([pair for i in order for pair in bins[i]])
# [(2, 2), (3, 2), (0, 1), (1, 3)]

sort or index aren't needed and the output is stable.

This algorithm is similar to the mapping mentioned in the supposed duplicate. This linked answer only works if to_order and order have the same lengths, which isn't the case in OP's question.

Performance

This algorithm iterates twice over each element of to_order. The complexity is O(n). @alfasin's first algorithm is much slower (O(n * m * log n)), but his second one is also O(n).

Here's a list with 10000 random pairs between 0 and 1000. We extract the unique second elements and shuffle them in order to define order:

from random import randrange, shuffle
from collections import defaultdict
from timeit import timeit
from itertools import chain

N = 1000
to_order = [(randrange(N), randrange(N)) for _ in range(10*N)]
order = list(set(pair[1] for pair in to_order))
shuffle(order)


def eric(to_order, order):
    bins = defaultdict(list)
    for pair in to_order:
        bins[pair[1]].append(pair)
    return list(chain.from_iterable(bins[i] for i in order))


def alfasin1(to_order, order):
    arr = [[] for i in range(len(order))]
    d = {k:v for v, k in enumerate(order)}
    for item in to_order:
        arr[d[item[1]]].append(item) 
    return [item for sublist in arr for item in sublist]

def alfasin2(to_order, order):
    return sorted(to_order, key=lambda item: order.index(item[1]))

print(eric(to_order, order) == alfasin1(to_order, order))
# True
print(eric(to_order, order) == alfasin2(to_order, order))
# True

print("eric", timeit("eric(to_order, order)", globals=globals(), number=100))
# eric 0.3117517130003762
print("alfasin1", timeit("alfasin1(to_order, order)", globals=globals(), number=100))
# alfasin1 0.36100843100030033
print("alfasin2", timeit("alfasin2(to_order, order)", globals=globals(), number=100))
# alfasin2 15.031453827000405

Upvotes: 20

Galen
Galen

Reputation: 1307

Another solution: [item for key in order for item in filter(lambda x: x[1] == key, to_order)]

This solution works off of order first, filtering to_order for each key in order.

Equivalent:

ordered = []
for key in order:
    for item in filter(lambda x: x[1] == key, to_order):
        ordered.append(item)

Shorter, but I'm not aware of a way to do this with list comprehension:

ordered = []
for key in order:
    ordered.extend(filter(lambda x: x[1] == key, to_order))

Note: This will not throw a ValueError if to_order contains a tuple x where x[1] is not in order.

Upvotes: 3

user1767754
user1767754

Reputation: 25094

I personally prefer the list objects sort function rather than the built-in sort which generates a new list rather than changing the list in place.

to_order = [(0, 1), (1, 3), (2, 2), (3,2)]
order = [2, 1, 3]
to_order.sort(key=lambda x: order.index(x[1]))
print(to_order)
>[(2, 2), (3, 2), (0, 1), (1, 3)]

A little explanation on the way: The key parameter of the sort method basically preprocesses the list and ranks all the values based on a measure. In our case order.index() looks at the first occurrence of the currently processed item and returns its position.

x = [1,2,3,4,5,3,3,5]
print x.index(5)
>4

Upvotes: 2

Related Questions