Kancha
Kancha

Reputation: 499

Natural Sort of list containing paths in Python

I have a list paths_list which contains the path of files(images) of a particular folder . Example:

['/home/username/images/s1/4.jpg', '/home/username/images/s1/7.jpg', 
'/home/username/images/s1/6.jpg', '/home/username/images/s1/3.jpg', 
'/home/username/images/s1/5.jpg', '/home/username/images/s1/10.jpg', 
'/home/username/images/s1/9.jpg', '/home/username/images/s1/1.jpg', 
'/home/username/images/s1/2.jpg', '/home/username/images/s1/12.jpg', 
'/home/username/images/s1/11.jpg', '/home/username/images/s1/8.jpg']

I want to sort then in the order: [/1.jpg ,2.jpg .....,/12.jpg] Neither sorting via length nor via alphabetical order is helping. What should be done here?

Upvotes: 9

Views: 19112

Answers (7)

Tyler
Tyler

Reputation: 500

To piggyback off of Shir's answer, if your file names are version numbers such as 1.0.ext, 2.3.4.ext, 3.0.ext, you can use:

import re
from pathlib import Path

files = Path('/your/path/here').glob('*.ext')

files = [
    f for f in files
    if re.match("[0-9]+\.[0-9]+\.?[0-9]*", f.stem)
]

files = sorted(
    files,
    key=lambda s: [int(u) for u in s.stem.split('.')]
)

Upvotes: 0

Eric O.
Eric O.

Reputation: 583

I find this neat

from pathlib import Path  # pathlib comes with python
sorted_files = sorted(files, key=lambda image_path: Path(image_path).name)

Upvotes: 0

Cory Kramer
Cory Kramer

Reputation: 117856

You can use sorted with a lambda. For the sorting criteria, you can use os to first pull just the file name (using basename), then you can split off just the filename less the extension (using splitext).

Lastly convert to int so you sort numerically instead of lexicographically.

>>> import os
>>> l = ['/home/username/images/s1/4.jpg', '/home/username/images/s1/7.jpg', '/home/username/images/s1/6.jpg', '/home/username/images/s1/3.jpg', '/home/username/images/s1/5.jpg', '/home/username/images/s1/10.jpg', '/home/username/images/s1/9.jpg', '/home/username/images/s1/1.jpg', '/home/username/images/s1/2.jpg', '/home/username/images/s1/12.jpg', '/home/username/images/s1/11.jpg', '/home/username/images/s1/8.jpg']
>>> sorted(l, key=lambda i: int(os.path.splitext(os.path.basename(i))[0]))
['/home/username/images/s1/1.jpg',
 '/home/username/images/s1/2.jpg',
 '/home/username/images/s1/3.jpg',
 '/home/username/images/s1/4.jpg',
 '/home/username/images/s1/5.jpg',
 '/home/username/images/s1/6.jpg',
 '/home/username/images/s1/7.jpg',
 '/home/username/images/s1/8.jpg',
 '/home/username/images/s1/9.jpg',
 '/home/username/images/s1/10.jpg',
 '/home/username/images/s1/11.jpg',
 '/home/username/images/s1/12.jpg']

Upvotes: 19

Shir
Shir

Reputation: 1649

Inspired by @Cory Kramer's answer, you can use the pathlib library and get a natural sort of the paths:

from pathlib import Path

a = ['/home/username/images/s1/4.jpg', 
     '/home/username/images/s1/7.jpg', 
     '/home/username/images/s1/6.jpg', 
     '/home/username/images/s1/3.jpg', 
     '/home/username/images/s1/5.jpg', 
     '/home/username/images/s1/10.jpg', 
     '/home/username/images/s1/9.jpg', 
     '/home/username/images/s1/1.jpg', 
     '/home/username/images/s1/2.jpg', 
     '/home/username/images/s1/12.jpg', 
     '/home/username/images/s1/11.jpg', 
     '/home/username/images/s1/8.jpg']

a = [Path(i) for i in a]
sorted_a = sorted(a, key=lambda i: int(i.stem))
sorted_a = [str(i) for i in a]

output:

['/home/username/images/s1/1.jpg',
 '/home/username/images/s1/2.jpg',
 '/home/username/images/s1/3.jpg',
 '/home/username/images/s1/4.jpg',
 '/home/username/images/s1/5.jpg',
 '/home/username/images/s1/6.jpg',
 '/home/username/images/s1/7.jpg',
 '/home/username/images/s1/8.jpg',
 '/home/username/images/s1/9.jpg',
 '/home/username/images/s1/10.jpg',
 '/home/username/images/s1/11.jpg',
 '/home/username/images/s1/12.jpg']

In general, using pathlib can sometimes give cleaner code expressions than plane os.path.

Upvotes: 5

Tbaki
Tbaki

Reputation: 1003

You can use split on "/", take the last element, split on ".", take the first, and convert it too an int:

l = ['/home/username/images/s1/4.jpg', '/home/username/images/s1/7.jpg', '/home/username/images/s1/6.jpg', '/home/username/images/s1/3.jpg', '/home/username/images/s1/5.jpg', '/home/username/images/s1/10.jpg', '/home/username/images/s1/9.jpg', '/home/username/images/s1/1.jpg', '/home/username/images/s1/2.jpg', '/home/username/images/s1/12.jpg', '/home/username/images/s1/11.jpg', '/home/username/images/s1/8.jpg']
sorted_list = sorted(l, key = lambda x: int(x.split("/")[-1].split(".")[0]))

output

['/home/username/images/s1/1.jpg',
 '/home/username/images/s1/2.jpg',
 '/home/username/images/s1/3.jpg',
 '/home/username/images/s1/4.jpg',
 '/home/username/images/s1/5.jpg',
 '/home/username/images/s1/6.jpg',
 '/home/username/images/s1/7.jpg',
 '/home/username/images/s1/8.jpg',
 '/home/username/images/s1/9.jpg',
 '/home/username/images/s1/10.jpg',
 '/home/username/images/s1/11.jpg',
 '/home/username/images/s1/12.jpg']

Upvotes: 1

VinceP
VinceP

Reputation: 2163

Use natural sorting (see this question): clean code and good practice when sorting strings.

from natsort import natsorted
l = ['/home/username/images/s1/4.jpg', '/home/username/images/s1/7.jpg', '/home/username/images/s1/6.jpg', '/home/username/images/s1/3.jpg', '/home/username/images/s1/5.jpg', '/home/username/images/s1/10.jpg', '/home/username/images/s1/9.jpg', '/home/username/images/s1/1.jpg', '/home/username/images/s1/2.jpg', '/home/username/images/s1/12.jpg', '/home/username/images/s1/11.jpg', '/home/username/images/s1/8.jpg']
natsorted(l)

gives

['/home/username/images/s1/1.jpg',
'/home/username/images/s1/2.jpg',
'/home/username/images/s1/3.jpg',
'/home/username/images/s1/4.jpg',
'/home/username/images/s1/5.jpg',
'/home/username/images/s1/6.jpg',
'/home/username/images/s1/7.jpg',
'/home/username/images/s1/8.jpg',
'/home/username/images/s1/9.jpg',
'/home/username/images/s1/10.jpg',
'/home/username/images/s1/11.jpg',
'/home/username/images/s1/12.jpg']

Natural sorting sorts based on how you would read things on a computer screen (alphabetically and numerically), rather than how the computer reads the code.

Upvotes: 13

void
void

Reputation: 2642

The other answers here are good. But anyhow I would like to post mine with some explanations

from os.path import basename,splitext
path_list = ['/home/username/images/s1/4.jpg', '/home/username/images/s1/7.jpg',
             '/home/username/images/s1/6.jpg', '/home/username/images/s1/3.jpg',
             '/home/username/images/s1/5.jpg', '/home/username/images/s1/10.jpg',
             '/home/username/images/s1/9.jpg', '/home/username/images/s1/1.jpg',
             '/home/username/images/s1/2.jpg', '/home/username/images/s1/12.jpg',
             '/home/username/images/s1/11.jpg', '/home/username/images/s1/8.jpg']

new_list = [splitext(basename(x))[0] for x in path_list]

fin_list = list(zip(path_list,new_list))

fin_list = [x[0] for x in sorted(fin_list,key=lambda x: int(x[1]))]

print(fin_list)

1) Creates a list which has only the file name. 1,2,.. and so on.

new_list = [splitext(basename(x))[0] for x in path_list]

Note: Why [0] ?? Because the output of each splitext(basename(x))[0] would be like this,

('1','.jpg') , ('4','.jpg')

so [0] 0th index gives us just the filename!

2) zip each and every item from both iterables with each other and create a list. So this list has values like these,

fin_list = list(zip(path_list,new_list))
#output
('/home/username/images/s1/4.jpg','4.jpg')

3) [x[0] for x in sorted(fin_list,key=lambda x: int(x[1]))]

This one creates a list from the sorted list of fin_list note key is the main thing here. Key will be the second item from tuple i.e 4,3,7,.. and such. Based on which sorting happens.

finally your output:

['/home/username/images/s1/1.jpg', '/home/username/images/s1/2.jpg',
 '/home/username/images/s1/3.jpg', '/home/username/images/s1/4.jpg',
 '/home/username/images/s1/5.jpg', '/home/username/images/s1/6.jpg', 
'/home/username/images/s1/7.jpg', '/home/username/images/s1/8.jpg',
 '/home/username/images/s1/9.jpg', '/home/username/images/s1/10.jpg',
 '/home/username/images/s1/11.jpg', '/home/username/images/s1/12.jpg']

Upvotes: 1

Related Questions