RoyM
RoyM

Reputation: 747

Sort list of objects by string-attribute

I've generated many image-files (PNG) within a folder. Each have names akin to "img0.png", "img1.png", ..., "img123164971.png" etc. The order of these images matter to me, and the numerical part represent the order I need to retrieve them before I add them to a html-form.

This question closely gives me a solution: Does Python have a built in function for string natural sort?

But I'm not entirely sure how to incorporate it into my specific code:

imagedata = list()
files_and_dirs = Path(imagefolder).glob('**/*')
images = [x for x in files_and_dirs if x.is_file() and x.suffix == '.png']

for image in images:
    imagedata.append("<img src='{0}/{1}' width='200'>".format(imagefolder, image.name))

These files are naturally read alphanumerically, but that is not what I want. I have a feeling that I can simply do a images = sort_function(images), but I'm unsure how exactly. I realize I can do this:

imagedata = list()
files_and_dirs = Path(barcodeimagefolder).glob('**/*')
images = [x.name for x in files_and_dirs if x.is_file() and x.suffix == '.png']
images = natural_sort(images)

for image in images:
    imagedata.append("<img src='{0}/{1}' width='200'>".format(imagefolder, image))

def natural_sort(l): 
    convert = lambda text: int(text) if text.isdigit() else text.lower() 
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(l, key = alphanum_key)

Using Mark Byers' solution in the link. But I later on need the list of the actual images themselves, and it seems redundant having two lists when one of them contains all the data in the other. Instead I would very much like to sort the list of image-files based on their file-name, in that way. Or better yet, read them from the folder in that order, if possible. Any advice?

Edit: I changed the title, making it a bit more condense and hopefully still accurate.

Upvotes: 2

Views: 1008

Answers (2)

cfelipe
cfelipe

Reputation: 350

You mean you just want to sort imagedata? Not pretty, but try:

imagedata.sort(key=lambda x : int(re.search('(\d+)', re.search('(src=\'.+\/)', x)[0])[0]))

The inner regex gets src='<something>/, while the outer gets the digits within <something>, assuming <something> has a non-digit prefix and a non-digit suffix.

Upvotes: 0

Bill M.
Bill M.

Reputation: 1548

Assuming you really want to have things "naturally sorted" strictly by the name of the individual file, as opposed to the full path (e.g., so "zzz/image01.png" comes before "aaa/image99.png"), (EDIT: I see now from the comments that this isn't the case) one way to do this is create an ordered dictionary where the keys are the filenames, and the values are the "" tags you're ultimately creating. Then do a natural sort of the dictionary keys, and return a list of the corresponding values.

So using a simple list of 3 made-up files and adding a twist to the original natural_sort, function:

import collections
import re

def files_with_natural_sort(l):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ]
    return [ l[newkey] for newkey in sorted(l, key = alphanum_key) ]

original_files = ["folder_c/file9.png", "folder_a/file11.png", "folder_b/file10.png"]

image_dict = collections.OrderedDict()

for file in original_files:
    [folder, filename] = file.split('/')
    image_dict[filename] = '<img src="%s" width="200">' % file

sorted_keys = files_with_natural_sort(image_dict)
print(sorted_keys)

This outputs:

['<img src="folder_c/file9.png" width="200">', '<img src="folder_b/file10.png"
    width="200">', '<img src="folder_a/file11.png" width="200">']

It's possible to get around this using a regular dictionary and playing with the .keys() list of that dictionary. But this still works. As far as trying to create a list of the files of the desired order as you read them, I suppose you could do some fancy bubble sorting for that, but I really wouldn't sweat it. Unless you have millions of files, I don't see the harm in using multiple lists.

Upvotes: 1

Related Questions