Terry
Terry

Reputation: 66103

Sorting tuples based on another list

I am working with clustered data that is being generated by SciPy, and would love to order my data with a custom sort order.

Let's say that my data comes out looking like this:

leafIDs = [4,5,3,1,2]
rowHeaders = ['lorem','ipsum','dolor','sit','amet']

There is a one-to-one correspondence between the two lists, leafIDs and rowHeaders. Both will always be the same length. For example, the row with the header lorem will have a leaf ID of 4, ipsum will have an ID of 5 and so on. Note that the leafIDs are not the order I wanted to sort them in (otherwise I can use the tried and tested method). The intended one-to-one correspondence can be visualised as follow:

+---------+------------+
| leafIDs | rowHeaders |
+---------+------------+
|       4 | lorem      |
|       5 | ipsum      |
|       3 | dolor      |
|       1 | sit        |
|       2 | amet       |
+---------+------------+

Now I would like to sort these two arrays by a custom order, which is again, will always be the same length as both aforementioned lists. You can see it as a scrambled order of rowHeaders:

rowHeaders_custom = ['amet','lorem','sit','ipsum','dolor']

The desired outcome, where leafIDs will be sorted based on rowHeaders_custom and its one-to-one relationship with rowHeaders, i.e.:

# Desired outcome
leafIDs_custom = [2,4,1,5,3]

What I've tried so far: my approach currently is as follow:

  1. Zip leafIDs and rowHeaders, i.e. zippedRows = zip(leafIDs, rowHeaders).
  2. Attempt to sort the list of tuples by the list rowHeaders_custom.

However, I am hitting a roadblock on the second step. It would nice if there are any suggestions on how to perform this custom ordered sort. I understand I might be hitting an XY problem by attempting to order a list of tuples with another list, but my understanding of sort() is rather limited.

Upvotes: 0

Views: 326

Answers (2)

Pynchia
Pynchia

Reputation: 11580

I presume you have several rows to rearrange, not just one.

Here is a solution that performs the translation of the columns only once, without building a mapping for every row (tuple) to be sorted. After all, the destinations remain the same.

It marks the original position of the headers and then builds the rearranged tuples picking from such locations

leaf_lst = [(4,5,3,1,2), (1,2,3,4,5), (6,7,8,9,0)]
rowHeaders = ['lorem','ipsum','dolor','sit','amet']
rowHeaders_custom = ['amet','lorem','sit','ipsum','dolor']

old_pos = tuple(rowHeaders.index(h) for h in rowHeaders_custom)
leaf_lst_custom  = [tuple(t[p] for p in old_pos) for t in leaf_lst]
print(leaf_lst_custom)

produces

[(2, 4, 1, 5, 3), (5, 1, 4, 2, 3), (0, 6, 9, 7, 8)]

Upvotes: 2

Linus Thiel
Linus Thiel

Reputation: 39223

What if you make a dict out of the zippedRows? I.e.

>>> dict(zip(rowHeaders, leafIDs))
{'ipsum': 5, 'sit': 1, 'lorem': 4, 'amet': 2, 'dolor': 3}

Capturing that, then:

dictRows = dict(zip(rowHeaders, leafIDs))

You could just pull the values out of that:

leafIDs_custom = [dictRows[v] for v in rowHeaders_custom]

I don't know, there might be a more pythonic way to do it, but that's the solution I'm coming up with.

Upvotes: 4

Related Questions