Reputation: 21961

Order file based on numbers in name

I have a bunch of file with names as follows:

tif_files = av_v5_1983_001.tif, av_v5_1983_002.tif, av_v5_1983_003.tif...av_v5_1984_001.tif, av_v5_1984_002.tif...av_v5_2021_001.tif, av_v5_2021_002.tif

However, they are not guaranteed to be in any sort of order.

I want to sort them based on names such that files from the same year are sorted together. When I do this

sorted(tif_files, key=lambda x:x.split('_')[-1][:-4])

I get the following result:

av_v5_1983_001.tif, av_v5_1984_001.tif, av_v5_1985_001.tif...av_v5_2021_001.tif

but I want this:

av_v5_1983_001.tif, av_v5_1983_002.tif, av_v5_1983_003.tif...av_v5_1984_001.tif, av_v5_1984_002.tif...av_v5_2021_001.tif, av_v5_2021_002.tif

Upvotes: 1

Answers (6)

khanh

Reputation: 625

If key returns a tuple of 2 values, the sort function will try to sort based on the first value then the second value. please refer to: https://stackoverflow.com/a/5292332/9532450

tif_files = [
    "hea_der_1983_002.tif",
    "hea_der_1983_001.tif",
    "hea_der_1984_002.tif",
    "hea_der_1984_001.tif",
]


def parse(filename: str) -> tuple[str, str]:
    split = filename.split("_")
    return split[2], split[3]


sort = sorted(tif_files, key=parse)
print(sort)

output

['hea_der_1983_001.tif', 'hea_der_1983_002.tif', 'hea_der_1984_001.tif', 'hea_der_1984_002.tif']

Upvotes: 1

I'mahdi

Reputation: 24049

if you have v1 or v2 or ... v5 or ... you need to consider number of version also like below:

tif_files = ['av_v1_1983_001.tif', 'av_v5_1983_002.tif', 'av_v6_1983_002.tif','av_v5_1984_001.tif', 'av_v5_1984_002.tif', 'av_v4_2021_001.tif','av_v5_2021_001.tif', 'av_v5_2021_002.tif', 'av_v4_1984_002.tif']

sorted(tif_files, key=lambda x: [x.split('_')[2:], x.split('_')[1]])

Output:

['av_v1_1983_001.tif',
 'av_v5_1983_002.tif',
 'av_v6_1983_002.tif',
 'av_v5_1984_001.tif',
 'av_v4_1984_002.tif',
 'av_v5_1984_002.tif',
 'av_v4_2021_001.tif',
 'av_v5_2021_001.tif',
 'av_v5_2021_002.tif']

Upvotes: 1

Angus B

Reputation: 129

As long as your naming convention remains consistent, you should be able to just sort them alphanumerically. As such, the below code should work;

sorted(tif_files)

If you instead wanted to sort by the last two numbers in the file name while ignoring the prefix, you would need something a bit more dramatic that would break those numbers out and let you order by them. You could use something like the below:

import pandas as pd
tif_files_list = [[xx, int(xx.split("_")[2]), int(xx.split("_")[3])] for xx in tif_files]
tif_files_frame = pd.DataFrame(tif_files_list, columns=["Name", "Primary Index", "Secondary Index"])
tif_files_frame_ordered = tif_files_frame.sort_values(["Primary Index", "Secondary Index"], axis=0)
tif_files_ordered = tif_files_frame_ordered["Name"].tolist()

This breaks the numbers in the names out into separate columns of a Pandas Dataframe, then sorts your entries by those broken out columns, at which point you can extract the ordered name column on its own.

Upvotes: 1

Epsi95

Reputation: 9047

take the last two using [2:] for example ['1984', '001.tif']

tif_files = 'av_v5_1983_001.tif', 'av_v5_1983_002.tif', 'av_v5_1983_003.tif',\
            'av_v5_1984_001.tif', 'av_v5_1984_002.tif', 'av_v5_2021_001.tif', 'av_v5_2021_002.tif'

sorted(tif_files, key=lambda x: x.split('_')[2:])

# ['av_v5_1983_001.tif',
#  'av_v5_1983_002.tif',
#  'av_v5_1983_003.tif',
#  'av_v5_1984_001.tif',
#  'av_v5_1984_002.tif',
#  'av_v5_2021_001.tif',
#  'av_v5_2021_002.tif']

Upvotes: 2

Michael Teguh Laksana

Reputation: 336

What you did was sorting it by the 00x index first then by the year as x.split('_')[-1] produces 001 and etc. Try to change the index to sort by year first , then sort it again by the index:

sorted(tif_files, key=lambda x:x.split('_')[2])
sorted(tif_files, key=lambda x:x.split('_')[-1][:-4])

Upvotes: 1

user16436946

Reputation: 129

right click your folder and click sort by >> name.

Upvotes: -2

Order file based on numbers in name

Answers (6)

Related Questions