Python: sorting strings by only one of their "part"

Through os.listdir() I have created a list of hundreds of files picked up from a folder. All the filenames have the following pattern:

obj1__5
obj1__10
obj1__15
...
...
obj1__250
...
obj2__5
obj2__10
...
obj2__250
... and so on up to obj99

The files in the folder were ordered following this scheme, however when using os.listdir() I got a list ordered in this way:

obj1__0.png
obj1__10.png
obj1__100.png
obj1__105.png
...
obj1__145.png
obj1__15.png
obj1__150.png
obj1__155.png
...
obj1__190.png
obj1__195.png
obj1__20.png
obj1__200.png
obj1__205.png
... and so on

Is there any way to pick up the file in the same order they are displayed in the folder? Or perhaps any sorting function I can use to put them back in their proper order? Thanks

Upvotes: 0

Views: 98

Answers (3)

zenofsahil
zenofsahil

Reputation: 1753

This should work for you.

import os
import re

def splitter(name):                                         
    reg = re.search("(\d+)__(\d+)", name)
    return (int(reg.group(1)), int(reg.group(2)))

files = map(lambda x: (x, splitter(x)[0], splitter(x)[1]), os.listdir())

temp = sorted(files, key = lambda x: (x[1], x[2]))   

sortedFiles = map(lambda x: x[0], temp)

The key argument to the sorted function essentially does a multi-argument sort, sorting by the first argument and then sorting on the second argument while respecting the first level of sorting.

Upvotes: 1

kindall
kindall

Reputation: 184211

A general-purpose natural sorting function is something like this:

import re

def naturalsort(name, digits=re.compile("([0-9]+)")):
    return [int(x) if x.isdigit() else x for x in digits.split(name)]

You get back a list that contains integer values of the runs of digits and string versions of the rest. You can use this as the key when sorting:

sorted(os.listdir(), key=naturalsort)

You might think that this would cause problems in Python 3 when you try to compare e.g. "abc.txt" with "123.txt", since trying to compare a str with an int is an error in Py3. It still works: because we're splitting on runs of digits, the first element of the key is '' for strings that start with a run of digits. Which puts numbered items before any alphabetic ones, as they should be. Another way to say it is the first element of the key is always a string (which might be empty), the second is always an integer, and so on alternating to the end of the string. Therefore Python is never trying to compare different types.

Upvotes: 3

coder
coder

Reputation: 12972

You can try that:

>>> l = ['obj1__0.png', 'obj1__10.png', 'obj3__15.png', 'obj1__15.png', 'obj2__15.png', 'obj1__100.png']
>>>
>>> sorted(l, key=lambda x: (int(x.split('__')[0][3:]),int(x.split('__')[1].strip('.png'))))
['obj1__0.png', 'obj1__10.png', 'obj1__15.png', 'obj1__100.png', 'obj2__15.png', 'obj3__15.png']

Upvotes: -1

Related Questions