Hocine Ben
Hocine Ben

Reputation: 2279

sort files in a specific order

I want to know how I can sort the filenames as they are in the directory. For example, I have the following names:

1_00000_6.54.csv
2_00000_1.70.csv
3_00000_1.70.csv
...
10_00000_1.70.csv
11_00000_1.70.csv
...

With the following python code I get the following order:

 def get_pixelist(path):
     return [os.path.join(path,f) for f in os.listdir(path) if f.endswith('.csv')]

 def group_uniqmz_intensities(path):
     pxlist = sorted(get_pixelist(path))

gives:

1_00000_6.54.csv
10_00000_1.70.csv
11_00000_1.70
...
2_00000_1.70.csv
...
3_00000_1.70.csv
...

I want the order shown before.

Upvotes: 2

Views: 322

Answers (4)

Martijn Pieters
Martijn Pieters

Reputation: 1121914

The easiest would be to zero-pad the filenames when sorting:

def group_uniqmz_intensities(path):
    pxlist = sorted(get_pixelist(path), key=lambda f: f.rjust(17, '0'))

which will pad each filename to 17 characters with 0 characters when sorting; so 1_00000_6.54.csv is padded to 01_00000_6.54.csv while 10_00000_1.70.csv is left as is. Lexographically, 01 sorts before 10.

I picked 17 as a hardcoded value to simplify things; you could find the required value automatically by using this instead:

def group_uniqmz_intensities(path):
    padsize = max(len(f) for f in pxlist)
    pxlist = sorted(get_pixelist(path), key=lambda f: f.rjust(padsize, '0'))

Upvotes: 2

Kos
Kos

Reputation: 72261

Here's a trivial implementation of natural ordering, assuming that your fields are all split by _:

def int_if_possible(s):
    try:
        return int(s)
    except:
        return s


>>> sorted(s, key=lambda s: map(int_if_possible, s.split('_')))
['1_00000_6.54.csv',
 '2_00000_1.70.csv',
 '3_00000_1.70.csv',
 '10_00000_1.70.csv',
 '11_00000_1.70.csv']

This implementation leverages the fact that lists get compared element-by-element. If the elements are convertible to ints, we compare them as ints, otherwise we fall back to string comparison.


Edit: A more elaborate solution for natural sorting is presented here: Natural string sorting.

It's pretty clever: it uses a regex \d+\D+ to split input strings into alternating numbers and non-numbers. Then numbers are compared numerically, and non-numbers alphabetically.

Upvotes: 0

BioGeek
BioGeek

Reputation: 22827

Based on this answer for alphanumerical sorting:

def group_uniqmz_intensities(path):
    pxlist = sorted(get_pixelist(path), key=lambda filename: int(filename.partition('_')[0]))

Upvotes: 0

Dimitri Vorona
Dimitri Vorona

Reputation: 480

Since '1' < '_' you get the second ordering. You can achieve your goal by giving a key-function to sorted:

 def group_uniqmz_intensities(path):
     pxlist = sorted(get_pixelist(path), key=lambda x: int(x.split("_")[0]))

Please make sure ALL of your files are following the same naming scheme ({number}_{rest}.csv) otherwise there will be a ValueError.

EDIT: Martijn Pieters provides a more elegant solution.

Upvotes: 0

Related Questions