jerry
jerry

Reputation: 405

How to sort files in the ascending order of integer contained in the file name

I have a list a. I want to arrange these files in ascending order like,

  1. kernal_1.0.npy
  2. kernal_10.npy
  3. kernal_50.npy
  4. kernal_100.npy

If I use split function it only splits extension npy. Sorted function only works fine on integers. What should I do for this purpose?

a = ['kernal_1.0.npy','kernal_100.npy','kernal_50.npy','kernal_10.npy' ]
b='kernal_1.0.npy'
print(os.path.splitext(b))

Upvotes: 1

Views: 650

Answers (5)

IMCoins
IMCoins

Reputation: 3306

Since the beginning is always the same, as well as the end, you could make your search based on the index.

a = ['kernal_1.0.npy','kernal_100.npy','kernal_50.npy','kernal_10.npy' ]
prefix_len = len('kernal_')
prefix_ext = len('.npy')

# Here, the key parameter means *how* you want to sort your list. So, 
# basically, at each operation, it will sort based on this argument. The 
# lambda here is basically a function, on which I invite you to document 
# yourself on.
# This line says : Sort this list, and compare every elements using 
# only the letters between the prefix_len'th index and the prefix_ext index,
# casted as `float` numbers.
b = sorted(a, key = lambda x: float(x[prefix_len:-prefix_ext]) )

print(b)
# ['kernal_1.0.npy', 'kernal_10.npy', 'kernal_50.npy', 'kernal_100.npy']

Probably more explicit explanation for you.

def show_list_based_on_lambda(arr, key):
    """ When you use the key parameter in a sorting function, it behaves
        the same way as here. Meaning at every iteration, it will
        only consider the elements returned by the function you sent in.
    """
    for elem in arr:
        print( key(elem) )


#   This function is supposed to strip off the first and last character of an iterable.
f = lambda x:x[1:-1]
arr = ['aaa', 'bbb', 'ccc', 'ddd', 'eee']
show_list_based_on_lambda(arr, f)
# a
# b
# c
# d
# e


#   This function is supposed to add one to every element that passes by.
f = lambda x:x+1
arr = [10, 20, 30, 40, 50]
show_list_based_on_lambda(arr, f)
# 11
# 21
# 31
# 41
# 51

Upvotes: 2

Allan
Allan

Reputation: 12438

You can try the following old and classic way of doing:

import re

def numeric_compare(x, y):
  u = re.findall("\d+(?:\.\d+)?", x)
  v = re.findall("\d+(?:\.\d+)?", y)
  u = [0] if len(u) == 0 else u
  v = [0] if len(v) == 0 else v
  return int(float(u[0]) - float(v[0]))

a = ['kernal_1.0.npy','kernal_100.npy','kernal_50.npy','kernal_10.npy' ]
print(a)
print(sorted(a, cmp=numeric_compare))

Output:

['kernal_1.0.npy', 'kernal_100.npy', 'kernal_50.npy', 'kernal_10.npy']
['kernal_1.0.npy', 'kernal_10.npy', 'kernal_50.npy', 'kernal_100.npy']

Explanations:

  • You define your own sorting function numeric_compare
  • You extract the real numbers from the string you are comparing
  • If your string does not contain any number you set the value to 0
  • Then you compare both extracted floated and you cast it back to int as you need to have your function return an int
  • You call sorted() on your list with your comparison function

This way of doing is robust and will also work on file where you do not have any number in them:

input:

b = ['kernal_1.0.npy','kernal_100.npy','kernal_50.npy','kernal_10.npy', 'abc' ]

output:

['abc', 'kernal_1.0.npy', 'kernal_10.npy', 'kernal_50.npy', 'kernal_100.npy']

If you prefer to have files that do not have numbers in them appearing at the end of the list instead of being sorted at the beginning then you can replace u = [0] and v = [0] by u = [sys.maxsize] and v = [sys.maxsize]. (you need to add import sys at the beginning of your code)

Regex demo and explanations:
https://regex101.com/r/evIeVD/1/

Upvotes: 0

Arkistarvh Kltzuonstev
Arkistarvh Kltzuonstev

Reputation: 6920

Try this :

b = sorted(a, key = lambda x : int(x[x.find('_')+1:].split('.')[0]))

OUTPUT :

b = ['kernal_1.0.npy', 'kernal_10.npy', 'kernal_50.npy', 'kernal_100.npy']

Upvotes: 0

Chris
Chris

Reputation: 29742

Use os.path.splitext with str.split in sorted or list.sort:

import os

a = ['kernal_1.0.npy','kernal_100.npy','kernal_50.npy','kernal_10.npy']

sorted(a, key = lambda x: float(os.path.splitext(x)[0].split('_')[1]))
# ['kernal_1.0.npy', 'kernal_10.npy', 'kernal_50.npy', 'kernal_100.npy']

Upvotes: 0

meW
meW

Reputation: 3967

You can use Pandas Series to generalize the solution:

a = np.array(['kernal_1.0.npy','kernal_100.npy','kernal_50.npy','kernal_10.npy' ])
idx_ = pd.Series(a).str.split('.', expand=True).iloc[:, 0]\
        .str.split('_', expand=True).iloc[:, 1]\
        .astype(int).sort_values(0).index

a[idx_]
array(['kernal_1.0.npy', 'kernal_10.npy', 'kernal_50.npy',
       'kernal_100.npy'], dtype='<U14')

Upvotes: 0

Related Questions