DescX
DescX

Reputation: 334

Sorting in a list containing lists

I've got the following code:

cases = []

for file in files:

    # Get value from files and write to data
    data = [ id, b, c, d, e, f, g, h, i, j, k ]

    # Append the values to the data list
    cases.append(data)

# Sort the cases descending
cases.sort(reverse=True)

After running the for loop the cases list looks like this:

cases = [ ['id', val, val], ['id', val, val], ['id', val, val] ] etc.

id is a value like '600', '900', '1009', '1009a' or '1010' which I want to sort descending.

At the moment '1009a' is on top of the list while I want it to be between '1009' and '1010'. This is probably related to '1009a' being parsed as unicode while the other values are being parsed as long. A debugger also confirms this.

I've tried converting the id field to unicode using unicode(id) while writing the data list, but this does not give the desired result either. After sorting cases, output will start at '999', until reaching '600' and then will start at '1130' and run down to '1000'. Instead of starting at '1130' and running down to '600'. Which is what i want, with '1009a' being between '1009' and '1010'.

Upvotes: 2

Views: 122

Answers (3)

Ma0
Ma0

Reputation: 15204

Same principle as the one used be @Tobias_k but not quite as neat.

from itertools import takewhile, dropwhile

cases = [ ['600', 'foo1', 'bar1'], ['900', 'foo2', 'bar2'], ['1009', 'foo6', 'bar6'], ['1009a', 'foo3', 'bar3'], ['1010', 'foo4', 'bar4'] ]

def sorter_helper(str_):
  n = ''.join(takewhile(lambda x: x.isnumeric(), str_))
  s = ''.join(dropwhile(lambda x: x.isnumeric(), str_))
  return (int(n), s)

cases = sorted(cases, key=lambda x: sorter_helper(x[0]))
print(cases)  # -> [['600', 'foo1', 'bar1'], ['900', 'foo2', 'bar2'], ['1009', 'foo6', 'bar6'], ['1009a', 'foo3', 'bar3'], ['1010', 'foo4', 'bar4']]

Upvotes: 0

Arthur Spoon
Arthur Spoon

Reputation: 462

Your problem is that when you are in unicode, you do have 9>1 and so 900>1000 as it compares from the first value.

What you need to do is write leading zeros for all your id fields so that 900 becomes 0900 and is now less than 1000. You can do this with this bit of code (although there are probably neater ways of doing it):

id = str(id).zfill(5)

Note that you don't need the str() bit if id is already a string. Here the zfill(5) will add zeros to the left of the string until the string is of length 5.

Upvotes: 0

tobias_k
tobias_k

Reputation: 82889

If you are comparing strings containing numbers, those are sorted in alphabetic order, i.e. without regarding how many digits the number has. You have to convert those to int first, but that's tricky with the a/b suffix. You can use a regular expression to separate the number and the suffix:

>>> p = re.compile(r"(\d+)(.*)")
>>> def comp(x):
...     n, s = p.match(x).groups()
...     return int(n), s
...
>>> ids = ["1009", "1009a", "1009b", "1010", "99"]
>>> [comp(x) for x in ids]
[(1009, ''), (1009, 'a'), (1009, 'b'), (1010, ''), (99, '')]
>>>  sorted(ids, key=comp)                  
['99', '1009', '1009a', '1009b', '1010']

Applying this to your example, you probably need this (not tested):

cases.sort(key=lambda x: comp(x[0]), reverse=True)

Upvotes: 4

Related Questions