user15051990
user15051990

Reputation: 1905

Sorting filepaths by extension in python

I want to sort this list in a way that .log should be the first file and .gz file should be in a descending order

my_list = [
     '/abc/a.log.1.gz',
     '/abc/a.log',
     '/abc/a.log.gz',
     '/abc/a.log.30.gz',
     '/abc/a.log.2.gz',
     '/abc/a.log.5.gz',
     '/abc/a.log.3.gz',
     '/abc/a.log.6.gz',
     '/abc/a.log.4.gz',
     '/abc/a.log.12.gz',
     '/abc/a.log.10.gz',
     '/abc/a.log.8.gz',
     '/abc/a.log.14.gz',
     '/abc/a.log.29.gz'
]

Expected Result:

my_list = ['/abc/a.log',
        '/abc/a.log.gz',
        '/abc/a.log.30.gz',
        '/abc/a.log.29.gz',
        '/abc/a.log.29.gz',
        '/abc/a.log.14.gz',
        '/abc/a.log.12.gz',
        '/abc/a.log.10.gz',
        '/abc/a.log.8.gz',
        '/abc/a.log.6.gz',
        '/abc/a.log.5.gz',
        '/abc/a.log.4.gz',
        '/abc/a.log.3.gz',
        '/abc/a.log.2.gz'
        '/abc/a.log.1.gz']

My solution: import os

def get_sort_keys(filepath):
    split_file_path = os.path.splitext(filepath)
    sort_key = (split_file_path[1], *os.path.splitext(split_file_path[0]))
    return (sort_key[0], sort_key[1], int(sort_key[2].strip(".")) if sort_key[2] else 0)

print(sorted(my_list, key=get_sort_keys, reverse=True))

Getting error:

ValueError: invalid literal for int() with base 10: 'log'

Upvotes: 1

Views: 177

Answers (3)

cs95
cs95

Reputation: 403278

You can use sorted with a custom function that does some try-except checking.

def try_convert(x):
    y = x.rsplit('.', 2)[-2]
    return ('log' not in x, int(y) if y.isdigit() else float('inf'), x)

sorted(my_list, key=try_convert, reverse=True)

['/abc/a.log.gz',
 '/abc/a.log',
 '/abc/a.log.30.gz',
 '/abc/a.log.29.gz',
 '/abc/a.log.14.gz',
 '/abc/a.log.12.gz',
 '/abc/a.log.10.gz',
 '/abc/a.log.8.gz',
 '/abc/a.log.6.gz',
 '/abc/a.log.5.gz',
 '/abc/a.log.4.gz',
 '/abc/a.log.3.gz',
 '/abc/a.log.2.gz',
 '/abc/a.log.1.gz']

The function ensures that filenames without the integer component are to be ordered last (first, if you sort in descending). Additionally, all ".log" files come first.

Upvotes: 2

U13-Forward
U13-Forward

Reputation: 71620

Or use:

>>> sorted(my_list,key=lambda x: int(x.split('.')[2]) if x.split('.')[2].isdigit() else 31,reverse=True)
['/abc/a.log.gz', '/abc/a.log.30.gz', '/abc/a.log.29.gz', '/abc/a.log.14.gz', '/abc/a.log.12.gz', '/abc/a.log.10.gz', '/abc/a.log.8.gz', '/abc/a.log.6.gz', '/abc/a.log.5.gz', '/abc/a.log.4.gz', '/abc/a.log.3.gz', '/abc/a.log.2.gz', '/abc/a.log.1.gz']
>>> 

Updated question:

>>> sorted(my_list,key=lambda x: int(x.split('.')[-2]) if x.split('.')[-2].isdigit() else 31,reverse=True)
['/abc/a.log', '/abc/a.log.gz', '/abc/a.log.30.gz', '/abc/a.log.29.gz', '/abc/a.log.14.gz', '/abc/a.log.12.gz', '/abc/a.log.10.gz', '/abc/a.log.8.gz', '/abc/a.log.6.gz', '/abc/a.log.5.gz', '/abc/a.log.4.gz', '/abc/a.log.3.gz', '/abc/a.log.2.gz', '/abc/a.log.1.gz']
>>> 

Upvotes: 1

JBurt
JBurt

Reputation: 48

You're trying to pass the string 'log' into int().

It cant then convert this to an int and raises ValueError: invalid literal for int() with base 10: 'log'

This is happening in here return (sort_key[0], sort_key[1], int(sort_key[2].strip(".")) if sort_key[2] else 0)

Try using a try catch block on the conversion

Upvotes: 1

Related Questions