Reputation: 890
Problem Description
I have a list of files ["FA_1","FA_2","FB_1","FB_2","FC_1","FC_2"]
. That list has 3
different file names FA
, FB
and FC
. For each of FA
, FB
and FC
, I am trying to retrieve the one with the max number
. The following script that I coded does that. But it's so complicated and ugly.
Is there a way to make it simpler?
A similar question was asked in Find file in directory with the highest number in the filename. But, they are only using the same file name.
#!/usr/bin/env python
import sys
import os
from collections import defaultdict
def load_newest_files():
# Retrieve all files for the component in question
list_of_files = defaultdict(list)
new_list_of_files = []
files = ["FA_1","FA_2","FB_1","FB_2","FC_1","FC_2"]
# Split files and take the filename without the id.
# The files are not separated in bucket of FA, FB and FC
# I can now retrieve the file with the max number and put
# it in a list
for file in files:
list_of_files[file.split("_")[0]].append(file)
for key,value in list_of_files.items():
new_list_of_files.append(max(value))
print(new_list_of_files)
def main():
load_newest_files()
if __name__ == "__main__":
main()
Upvotes: 0
Views: 108
Reputation: 4860
Why do you think it is complicated and ugly?
You could use a list comprehension instead of these 3 lines:
new_list_of_files = []
# [...]
for key,value in list_of_files.items():
new_list_of_files.append(max(value))
Like so:
new_list_of_files = [max(value) for value in list_of_files.values()]
Alternatively you can sort the list of files in reverse, then iterate over the list, adding only the first instance (which will be the highest) of each filename prefix to a new list, using a set
to keep track of what filename prefixes have already been added.
files = ["FA_1", "FA_2", "FB_1", "FB_2", "FC_1", "FC_2"]
files.sort(reverse=True)
already_seen = set()
new_filenames = []
for file in files:
prefix = file.split("_")[0]
if prefix not in already_seen:
already_seen.add(prefix)
new_filenames.append(file)
print(new_filenames)
Output: ['FC_2', 'FB_2', 'FA_2']
You can get it down to 2 lines with a complicated and ugly list comprehension:
files = ["FA_1", "FA_2", "FB_1", "FB_2", "FC_1", "FC_2"]
already_seen = set()
new_filenames = [(file, already_seen.add(prefix))[0] for file in files[::-1] if (prefix := file.split("_")[0]) not in already_seen]
print(new_filenames)
Upvotes: 1
Reputation: 31
You can use the regex
library and sort()
. An example is shown below.
import re
def load_newest_files():
files = ["FA_1", "FA_2", "FB_1", "FB_2", "FC_1", "FC_2"]
# Sort the list
files.sort()
concat_files = " ".join(files)
a = dict(re.findall('(.*?)_([0-9])[ ]?', concat_files))
new_list_of_files = ["%s_%s" % (i, j) for i, j in a.items()]
return new_list_of_files
def main():
newest_files = load_newest_files()
print(newest_files)
if __name__ == "__main__":
main()
Upvotes: 1
Reputation: 14216
You can use itertools.groupby
and create custom grouping and maximum functions for the key
arguments. Example is shown below.
from itertools import groupby
def custom_group(item):
x, _ = item.split("_")
return x
def custom_max(item):
_, y = item.split("_")
return int(y)
for _, v in groupby(files, key=custom_group):
val = max(v, key=custom_max)
new_list_of_files.append(val)
print(new_list_of_files)
> ['FA_2', 'FB_2', 'FC_2']
Please make sure to read the caveats surrounding itertools.groupby
regarding the sort order of your input data.
Upvotes: 2