Reputation: 2279
I want to know how I can sort the filenames as they are in the directory. For example, I have the following names:
1_00000_6.54.csv
2_00000_1.70.csv
3_00000_1.70.csv
...
10_00000_1.70.csv
11_00000_1.70.csv
...
With the following python code I get the following order:
def get_pixelist(path):
return [os.path.join(path,f) for f in os.listdir(path) if f.endswith('.csv')]
def group_uniqmz_intensities(path):
pxlist = sorted(get_pixelist(path))
gives:
1_00000_6.54.csv
10_00000_1.70.csv
11_00000_1.70
...
2_00000_1.70.csv
...
3_00000_1.70.csv
...
I want the order shown before.
Upvotes: 2
Views: 322
Reputation: 1121914
The easiest would be to zero-pad the filenames when sorting:
def group_uniqmz_intensities(path):
pxlist = sorted(get_pixelist(path), key=lambda f: f.rjust(17, '0'))
which will pad each filename to 17 characters with 0
characters when sorting; so 1_00000_6.54.csv
is padded to 01_00000_6.54.csv
while 10_00000_1.70.csv
is left as is. Lexographically, 01
sorts before 10
.
I picked 17 as a hardcoded value to simplify things; you could find the required value automatically by using this instead:
def group_uniqmz_intensities(path):
padsize = max(len(f) for f in pxlist)
pxlist = sorted(get_pixelist(path), key=lambda f: f.rjust(padsize, '0'))
Upvotes: 2
Reputation: 72261
Here's a trivial implementation of natural ordering, assuming that your fields are all split by _
:
def int_if_possible(s):
try:
return int(s)
except:
return s
>>> sorted(s, key=lambda s: map(int_if_possible, s.split('_')))
['1_00000_6.54.csv',
'2_00000_1.70.csv',
'3_00000_1.70.csv',
'10_00000_1.70.csv',
'11_00000_1.70.csv']
This implementation leverages the fact that lists get compared element-by-element. If the elements are convertible to ints, we compare them as ints, otherwise we fall back to string comparison.
Edit: A more elaborate solution for natural sorting is presented here: Natural string sorting.
It's pretty clever: it uses a regex \d+\D+
to split input strings into alternating numbers and non-numbers. Then numbers are compared numerically, and non-numbers alphabetically.
Upvotes: 0
Reputation: 22827
Based on this answer for alphanumerical sorting:
def group_uniqmz_intensities(path):
pxlist = sorted(get_pixelist(path), key=lambda filename: int(filename.partition('_')[0]))
Upvotes: 0
Reputation: 480
Since '1' < '_' you get the second ordering. You can achieve your goal by giving a key-function to sorted:
def group_uniqmz_intensities(path):
pxlist = sorted(get_pixelist(path), key=lambda x: int(x.split("_")[0]))
Please make sure ALL of your files are following the same naming scheme ({number}_{rest}.csv) otherwise there will be a ValueError.
EDIT: Martijn Pieters provides a more elegant solution.
Upvotes: 0