Reputation: 21961
I have a bunch of file with names as follows:
tif_files = av_v5_1983_001.tif, av_v5_1983_002.tif, av_v5_1983_003.tif...av_v5_1984_001.tif, av_v5_1984_002.tif...av_v5_2021_001.tif, av_v5_2021_002.tif
However, they are not guaranteed to be in any sort of order.
I want to sort them based on names such that files from the same year are sorted together. When I do this
sorted(tif_files, key=lambda x:x.split('_')[-1][:-4])
I get the following result:
av_v5_1983_001.tif, av_v5_1984_001.tif, av_v5_1985_001.tif...av_v5_2021_001.tif
but I want this:
av_v5_1983_001.tif, av_v5_1983_002.tif, av_v5_1983_003.tif...av_v5_1984_001.tif, av_v5_1984_002.tif...av_v5_2021_001.tif, av_v5_2021_002.tif
Upvotes: 1
Views: 109
Reputation: 625
If key
returns a tuple of 2 values, the sort
function will try to sort based on the first value then the second value.
please refer to: https://stackoverflow.com/a/5292332/9532450
tif_files = [
"hea_der_1983_002.tif",
"hea_der_1983_001.tif",
"hea_der_1984_002.tif",
"hea_der_1984_001.tif",
]
def parse(filename: str) -> tuple[str, str]:
split = filename.split("_")
return split[2], split[3]
sort = sorted(tif_files, key=parse)
print(sort)
output
['hea_der_1983_001.tif', 'hea_der_1983_002.tif', 'hea_der_1984_001.tif', 'hea_der_1984_002.tif']
Upvotes: 1
Reputation: 24049
if you have v1
or v2
or ... v5
or ... you need to consider number of version also like below:
tif_files = ['av_v1_1983_001.tif', 'av_v5_1983_002.tif', 'av_v6_1983_002.tif','av_v5_1984_001.tif', 'av_v5_1984_002.tif', 'av_v4_2021_001.tif','av_v5_2021_001.tif', 'av_v5_2021_002.tif', 'av_v4_1984_002.tif']
sorted(tif_files, key=lambda x: [x.split('_')[2:], x.split('_')[1]])
Output:
['av_v1_1983_001.tif',
'av_v5_1983_002.tif',
'av_v6_1983_002.tif',
'av_v5_1984_001.tif',
'av_v4_1984_002.tif',
'av_v5_1984_002.tif',
'av_v4_2021_001.tif',
'av_v5_2021_001.tif',
'av_v5_2021_002.tif']
Upvotes: 1
Reputation: 129
As long as your naming convention remains consistent, you should be able to just sort them alphanumerically. As such, the below code should work;
sorted(tif_files)
If you instead wanted to sort by the last two numbers in the file name while ignoring the prefix, you would need something a bit more dramatic that would break those numbers out and let you order by them. You could use something like the below:
import pandas as pd
tif_files_list = [[xx, int(xx.split("_")[2]), int(xx.split("_")[3])] for xx in tif_files]
tif_files_frame = pd.DataFrame(tif_files_list, columns=["Name", "Primary Index", "Secondary Index"])
tif_files_frame_ordered = tif_files_frame.sort_values(["Primary Index", "Secondary Index"], axis=0)
tif_files_ordered = tif_files_frame_ordered["Name"].tolist()
This breaks the numbers in the names out into separate columns of a Pandas Dataframe, then sorts your entries by those broken out columns, at which point you can extract the ordered name column on its own.
Upvotes: 1
Reputation: 9047
take the last two using [2:]
for example ['1984', '001.tif']
tif_files = 'av_v5_1983_001.tif', 'av_v5_1983_002.tif', 'av_v5_1983_003.tif',\
'av_v5_1984_001.tif', 'av_v5_1984_002.tif', 'av_v5_2021_001.tif', 'av_v5_2021_002.tif'
sorted(tif_files, key=lambda x: x.split('_')[2:])
# ['av_v5_1983_001.tif',
# 'av_v5_1983_002.tif',
# 'av_v5_1983_003.tif',
# 'av_v5_1984_001.tif',
# 'av_v5_1984_002.tif',
# 'av_v5_2021_001.tif',
# 'av_v5_2021_002.tif']
Upvotes: 2
Reputation: 336
What you did was sorting it by the 00x
index first then by the year as x.split('_')[-1]
produces 001
and etc. Try to change the index to sort by year first , then sort it again by the index:
sorted(tif_files, key=lambda x:x.split('_')[2])
sorted(tif_files, key=lambda x:x.split('_')[-1][:-4])
Upvotes: 1