Mariah_5288
Mariah_5288

Reputation: 77

Search for files that match any strings from a list?

I want to recursively walk through a directory, find the files that match any of the strings in a given list, and then copy these files to another folder. I thought the any() function would accomplish this, but I get a TypeError that it expected a string, not a list. Is there a more elegant way to do this?

string_to_match = ['apple.txt', 'pear.txt', 'banana.txt']

for root, subdirs, filename in os.walk(source_dir)
    if any(s in filename for s in string_to_match):
        shutil.copy(filename, destination_dir)
        print(filename)

I know glob.glob can work well for finding files that match a specific string or pattern, but I haven't been able to find an answer that allows for multiple matches.

Upvotes: 0

Views: 1663

Answers (3)

mjspier
mjspier

Reputation: 6536

You can just use in

Example:

string_to_match = ['apple.txt', 'pear.txt', 'banana.txt']

for root, subdirs, filename in os.walk(source_dir)
    if filename in string_to_match:
        shutil.copy(filename, destination_dir)
        print(filename)

Here also a glob version:

import glob
import itertools

root_dir = '/home/user'
files = ['apple.txt', 'pear.txt', 'banana.txt']
files_found = list(itertools.chain.from_iterable([glob.glob(f'{root_dir}/**/{f}', recursive=True) for f in files])
for f in files_found:
     shutil.copy(f, destination_dir)  
    

Upvotes: 1

Joran Beasley
Joran Beasley

Reputation: 114088

I would use sets

def find_names(names,source_dir):
    names = set(names)
    # note os.walk will walk the subfolders too
    # if you just want that source_dir use `strings_to_match.intersection(os.listdir(sourcedir))`
    for root,subdirs,fnames in os.walk(sourcedir):
       for matched_name in strings_to_match.intersection(fnames):
           yield os.path.join(root,matched_name)
    
strings_to_match = ['apple.txt', 'pear.txt', 'banana.txt']
for match in find_names(strings_to_match,'/path/to/start'):
   print("Match:", match)

[edited] typo intersection not intersect

(you could alternatively just pass in a set {'a','b','c'} instead of a list ['a','b','c'] and skip the conversion to a set)

here is an alternative that only looks in the source dir (not children)

def find_names_in_folder(names,source_dir):
    return [os.path.join(source_dir,n) for n in set(names).intersection(os.listdir(source_dir))]

Upvotes: 0

Park
Park

Reputation: 384

First, find an element in list takes O(n), so just convert it to a set which takes O(1).

Then you can do like this

string_to_match = {'apple.txt', 'pear.txt', 'banana.txt'}
for filename in os.listdir(source_dir):
    if filename in string_to_match:
        shutil.copy(filename, destination_dir)
        print(filename)

    

Upvotes: 1

Related Questions