Reputation: 59
How can I sort a path that it contains integer as well as strings? My file names are :
tmp_1483228800-1485907200_0,
tmp_1483228800-1485907200_1,
tmp_1483228800-1485907200_2,
....
I need to sort them according to the integers after the last underline. That’s how my code looks like:
act = "." + "/*/raw_results.csv"
files = glob.glob(act)
sorted_list = sorted(files, key = lambda x:int(os.path.splitext(os.path.dirname(x))[0]))
I know the problem is there are lot of integers and some strings in between so it can not convert everything to integer,but I do not know how to solve it. Thanks in advance.
Upvotes: 2
Views: 1582
Reputation: 1
there is a functon called sort() i wanna get file path to just see how it work
Upvotes: 0
Reputation: 2055
code:
import re, os
PATH = "C:\Temp"
lst = ['tmp_1483228800-1485907200_1', 'tmp_1483228800-1485907200_0', 'tmp_1483228800-1485907200_2']
def stringSplitByNumbers(x):
l = re.findall('\d$', x)[0]
return [int(y) if y.isdigit() else y for y in l]
print [ os.path.join(PATH, _) for _ in sorted(lst, key=stringSplitByNumbers)]
output:
['C:\\Temp\\tmp_1483228800-1485907200_0', 'C:\\Temp\\tmp_1483228800-1485907200_1', 'C:\\Temp\\tmp_1483228800-1485907200_2']
Upvotes: 1
Reputation: 82889
According to your comments, your files will be in this format:
>>> files = [".../data/tmp_1483228801-1485907200_10/raw_results.csv",
".../data/tmp_1483228800-1485907200_1/raw_results.csv",
".../data/tmp_1483228801-1485907201_30/raw_results.csv",
".../data/tmp_1483228801-1485907200_2/raw_results.csv",
".../data/tmp_1483228801-1485907201_9/raw_results.csv"]
You can then just extract all the numbers in those full, raw file paths, and convert those to int
. No need to split the path up into directory path segments.
>>> [[int(n) for n in re.findall(r"\d+", f)] for f in files]
[[1483228801, 1485907200, 10],
[1483228800, 1485907200, 1],
[1483228801, 1485907201, 30],
[1483228801, 1485907200, 2],
[1483228801, 1485907201, 9]]
This will extract all the numbers in the path and sort by them, giving the highest priority to the first number it finds. If those other numbers are all the same, that's not a problem, and if those are different, it will sort by those, first.
>>> sorted(files, key=lambda f: [int(n) for n in re.findall(r"\d+", f)])
['.../data/tmp_1483228800-1485907200_1/raw_results.csv',
'.../data/tmp_1483228801-1485907200_2/raw_results.csv',
'.../data/tmp_1483228801-1485907200_10/raw_results.csv',
'.../data/tmp_1483228801-1485907201_9/raw_results.csv',
'.../data/tmp_1483228801-1485907201_30/raw_results.csv']
If that's not what you want, you can use the (slightly wasteful) key=lambda f: [int(n) for n in re.findall(r"\d+", f)][-1]
to only sort by the last number.
Upvotes: 2
Reputation: 41168
You could simply use str.rsplit()
for the key:
>>> lst = ['tmp_1483228800-1485907200_1', 'tmp_1483228800-1485907200_2','tmp_1483228800-1485907200_0']
>>> sorted(lst, key=lambda x: int(x.rsplit('_', 1)[-1]))
['tmp_1483228800-1485907200_0', 'tmp_1483228800-1485907200_1', 'tmp_1483228800-1485907200_2']
Upvotes: 2