Hani Gotc
Hani Gotc

Reputation: 890

Find file in directory with the max number given a set of different file names

Problem Description

I have a list of files ["FA_1","FA_2","FB_1","FB_2","FC_1","FC_2"]. That list has 3 different file names FA, FB and FC. For each of FA, FB and FC, I am trying to retrieve the one with the max number. The following script that I coded does that. But it's so complicated and ugly.

Is there a way to make it simpler?

A similar question was asked in Find file in directory with the highest number in the filename. But, they are only using the same file name.


#!/usr/bin/env python

import sys
import os
from collections import defaultdict

def load_newest_files():
    # Retrieve all files for the component in question
    list_of_files = defaultdict(list)
    new_list_of_files = []
    files = ["FA_1","FA_2","FB_1","FB_2","FC_1","FC_2"]

    # Split files and take the filename without the id.
    # The files are not separated in bucket of FA, FB and FC
    # I can now retrieve the file with the max number and put
    # it in a list
    for file in files:
        list_of_files[file.split("_")[0]].append(file)

    for key,value in list_of_files.items():
        new_list_of_files.append(max(value))
    
    print(new_list_of_files)


def main():
    load_newest_files()

if __name__ == "__main__":
    main()

Upvotes: 0

Views: 108

Answers (3)

GordonAitchJay
GordonAitchJay

Reputation: 4860

Why do you think it is complicated and ugly?

You could use a list comprehension instead of these 3 lines:

    new_list_of_files = []
    # [...]
    for key,value in list_of_files.items():
        new_list_of_files.append(max(value))

Like so:

new_list_of_files = [max(value) for value in list_of_files.values()]

Alternatively you can sort the list of files in reverse, then iterate over the list, adding only the first instance (which will be the highest) of each filename prefix to a new list, using a set to keep track of what filename prefixes have already been added.

files = ["FA_1", "FA_2", "FB_1", "FB_2", "FC_1", "FC_2"]
files.sort(reverse=True)

already_seen = set()
new_filenames = []
for file in files:
    prefix = file.split("_")[0]
    if prefix not in already_seen:
        already_seen.add(prefix)
        new_filenames.append(file)

print(new_filenames)

Output: ['FC_2', 'FB_2', 'FA_2']

You can get it down to 2 lines with a complicated and ugly list comprehension:

files = ["FA_1", "FA_2", "FB_1", "FB_2", "FC_1", "FC_2"]
already_seen = set()
new_filenames = [(file, already_seen.add(prefix))[0] for file in files[::-1] if (prefix := file.split("_")[0]) not in already_seen]
print(new_filenames)

Upvotes: 1

Kshitij Srivastava
Kshitij Srivastava

Reputation: 31

You can use the regex library and sort(). An example is shown below.

import re

def load_newest_files():
    files = ["FA_1", "FA_2", "FB_1", "FB_2", "FC_1", "FC_2"]
    # Sort the list
    files.sort()
    concat_files = " ".join(files)
    a = dict(re.findall('(.*?)_([0-9])[ ]?', concat_files))
    new_list_of_files = ["%s_%s" % (i, j) for i, j in a.items()]
    return new_list_of_files
    
def main():
    newest_files = load_newest_files()
    print(newest_files)

if __name__ == "__main__":
    main()

Upvotes: 1

gold_cy
gold_cy

Reputation: 14216

You can use itertools.groupby and create custom grouping and maximum functions for the key arguments. Example is shown below.

from itertools import groupby

def custom_group(item):
    x, _ = item.split("_")
    return x
def custom_max(item):
    _, y = item.split("_")
    return int(y)

for _, v in groupby(files, key=custom_group):
    val = max(v, key=custom_max)
    new_list_of_files.append(val)

print(new_list_of_files)
> ['FA_2', 'FB_2', 'FC_2']

Please make sure to read the caveats surrounding itertools.groupby regarding the sort order of your input data.

Upvotes: 2

Related Questions