Maryam Nahid
Maryam Nahid

Reputation: 59

Sort a list that contains path in python

How can I sort a path that it contains integer as well as strings? My file names are :

tmp_1483228800-1485907200_0, 
tmp_1483228800-1485907200_1,
tmp_1483228800-1485907200_2,
.... 

I need to sort them according to the integers after the last underline. That’s how my code looks like:

act = "." + "/*/raw_results.csv"
files = glob.glob(act)
sorted_list = sorted(files, key = lambda x:int(os.path.splitext(os.path.dirname(x))[0]))

I know the problem is there are lot of integers and some strings in between so it can not convert everything to integer,but I do not know how to solve it. Thanks in advance.

Upvotes: 2

Views: 1582

Answers (4)

azz
azz

Reputation: 1

there is a functon called sort() i wanna get file path to just see how it work

Upvotes: 0

Mahesh Karia
Mahesh Karia

Reputation: 2055

code:

import re, os
PATH = "C:\Temp"
lst = ['tmp_1483228800-1485907200_1', 'tmp_1483228800-1485907200_0', 'tmp_1483228800-1485907200_2']

def stringSplitByNumbers(x):
    l = re.findall('\d$', x)[0]
    return [int(y) if y.isdigit() else y for y in l]

print [ os.path.join(PATH, _) for _ in sorted(lst, key=stringSplitByNumbers)]

output:

['C:\\Temp\\tmp_1483228800-1485907200_0', 'C:\\Temp\\tmp_1483228800-1485907200_1', 'C:\\Temp\\tmp_1483228800-1485907200_2']

Upvotes: 1

tobias_k
tobias_k

Reputation: 82889

According to your comments, your files will be in this format:

>>> files = [".../data/tmp_1483228801-1485907200_10/raw_results.csv",
             ".../data/tmp_1483228800-1485907200_1/raw_results.csv",
             ".../data/tmp_1483228801-1485907201_30/raw_results.csv",
             ".../data/tmp_1483228801-1485907200_2/raw_results.csv",
             ".../data/tmp_1483228801-1485907201_9/raw_results.csv"]

You can then just extract all the numbers in those full, raw file paths, and convert those to int. No need to split the path up into directory path segments.

>>> [[int(n) for n in re.findall(r"\d+", f)] for f in files]
[[1483228801, 1485907200, 10],
 [1483228800, 1485907200, 1],
 [1483228801, 1485907201, 30],
 [1483228801, 1485907200, 2],
 [1483228801, 1485907201, 9]]

This will extract all the numbers in the path and sort by them, giving the highest priority to the first number it finds. If those other numbers are all the same, that's not a problem, and if those are different, it will sort by those, first.

>>> sorted(files, key=lambda f: [int(n) for n in re.findall(r"\d+", f)])
['.../data/tmp_1483228800-1485907200_1/raw_results.csv',
 '.../data/tmp_1483228801-1485907200_2/raw_results.csv',
 '.../data/tmp_1483228801-1485907200_10/raw_results.csv',
 '.../data/tmp_1483228801-1485907201_9/raw_results.csv',
 '.../data/tmp_1483228801-1485907201_30/raw_results.csv']

If that's not what you want, you can use the (slightly wasteful) key=lambda f: [int(n) for n in re.findall(r"\d+", f)][-1] to only sort by the last number.

Upvotes: 2

Chris_Rands
Chris_Rands

Reputation: 41168

You could simply use str.rsplit() for the key:

>>> lst = ['tmp_1483228800-1485907200_1', 'tmp_1483228800-1485907200_2','tmp_1483228800-1485907200_0']
>>> sorted(lst, key=lambda x: int(x.rsplit('_', 1)[-1]))
['tmp_1483228800-1485907200_0', 'tmp_1483228800-1485907200_1', 'tmp_1483228800-1485907200_2']

Upvotes: 2

Related Questions