ensnare
ensnare

Reputation: 42093

In Python, fastest way to build a list of files in a directory with a certain extension

In Python on a GNU/Linux system, what's the fastest way to recursively scan a directory for all .MOV or .AVI files, and to store them in a list?

Upvotes: 10

Views: 15695

Answers (8)

Jundiaius
Jundiaius

Reputation: 7678

From Python 3.12 onwards, it is possible to use Path.walk of the module pathlib.

By using Path objects instead of string representation of paths, this module makes easier to combine paths, and it allow to use the property .suffix

from pathlib import Path


suffixes = set(['.AVI', '.MOV'])
files_with_suffix = list()

for root, dirs, files in Path(".").walk():
    for file in files:
        if file.suffix in suffixes:
            files_with_suffix.append(root / file)

Upvotes: 2

H. Sánchez
H. Sánchez

Reputation: 626

You can also use pathlib for this.

from pathlib import Path

files_mov = list(Path(path).rglob("*.MOV"))

Upvotes: 0

Rik Poggi
Rik Poggi

Reputation: 29302

I suggest the use of os.walk and a carefully reading of its documentation.

This may be a one liner approach:

[f for root,dirs,files in os.walk('/your/path') for f in files if is_video(f)]

Where in is_video you check your extensions.

Upvotes: 2

tzot
tzot

Reputation: 96001

Python 2.x:

import os

def generic_tree_matching(rootdirname, filterfun):
    return [
        os.path.join(dirname, filename)
        for dirname, dirnames, filenames in os.walk(rootdirname)
        for filename in filenames
        if filterfun(filename)]

def matching_ext(rootdirname, extensions):
    "Case sensitive extension matching"
    return generic_tree_matching(
        rootdirname,
        lambda fn: fn.endswith(extensions))

def matching_ext_ci(rootdirname, extensions):
    "Case insensitive extension matching"
    try:
        extensions= extensions.lower()
    except AttributeError: # assume it's a sequence of extensions
        extensions= tuple(
            extension.lower()
            for extension in extensions)
    return generic_tree_matching(
        rootdirname,
        lambda fn: fn.lower().endswith(extensions))

Use either matching_ext or matching_ext_ci with arguments the root folder and an extension or a tuple of extensions:

>>> matching_ext(".", (".mov", ".avi"))

Upvotes: 1

Jhonathan
Jhonathan

Reputation: 1601

pattern = re.compile('.*\.(mov|MOV|avi|mpg)$')

def fileList(source):
   matches = []
   for root, dirnames, filenames in os.walk(source):
       for filename in filter(lambda name:pattern.match(name),filenames):
           matches.append(os.path.join(root, filename))
   return matches

Upvotes: 3

user97370
user97370

Reputation:

I'd use os.walk to scan the directory, os.path.splitext to grab the suffix and filter them myself.

suffixes = set(['.AVI', '.MOV'])
for dirpath, dirnames, filenames in os.walk('.'):
    for f in filenames:
        if os.path.splitext(f)[1] in suffixes:
            yield os.path.join(dirpath, f)

Upvotes: 7

Aleksandra Zalcman
Aleksandra Zalcman

Reputation: 3498

You can use os.walk() for recuresive walking and glob.glob() or fnmatch.filter() for file matching:

Check this answer

Upvotes: 7

milancurcic
milancurcic

Reputation: 6241

Example for a list of files in current directory. You can expand this for specific paths.

import glob
movlist = glob.glob('*.mov')

Upvotes: 4

Related Questions