Peter Wilson
Peter Wilson

Reputation: 1135

Python shutil copytree: use ignore function to keep specific files types

I'm trying to figure out how to copy CAD drawings (".dwg", ".dxf) from a source directory with subfolders to a destination directory and maintaining the original directory and subfolders structure.

I found the following answer from @martineau within the following post: Python Factory Function

from fnmatch import fnmatch, filter
from os.path import isdir, join
from shutil import copytree

def include_patterns(*patterns):
    """Factory function that can be used with copytree() ignore parameter.

    Arguments define a sequence of glob-style patterns
    that are used to specify what files to NOT ignore.
    Creates and returns a function that determines this for each directory
    in the file hierarchy rooted at the source directory when used with
    shutil.copytree().
    """
    def _ignore_patterns(path, names):
        keep = set(name for pattern in patterns
                            for name in filter(names, pattern))
        ignore = set(name for name in names
                        if name not in keep and not isdir(join(path, name)))
        return ignore
    return _ignore_patterns

# sample usage

copytree(src_directory, dst_directory,
         ignore=include_patterns('*.dwg', '*.dxf'))

Updated: 18:21. The following code works as expected, except that I'd like to ignore folders that don't contain any include_patterns('.dwg', '.dxf')

Upvotes: 31

Views: 50927

Answers (2)

VH2020
VH2020

Reputation: 29

import os
import shutil



def has_extension(src: str, ext: str) -> bool:
    _, actual_extension = os.path.splitext(src)
    return ext == actual_extension

def copy_if(src: str, dst: str, *, 
    include_ext_patterns: tuple[str, ...]) -> None:
    if any(has_extension(src, ext) for ext in include_ext_patterns):
        print(f"Copying {src} to {dst}")
        shutil.copy2(src, dst)


def copy_files(
    *, src: str, dst: str, include_ext_patterns: tuple[str, ...]) -> None:
    if not isinstance(include_ext_patterns, tuple):
    raise TypeError(f"include_ext_patterns should be {tuple} not {type(include_ext_patterns)}")
    shutil.copytree(
                str(src),
                str(dst),
                dirs_exist_ok=True,
                copy_function=partial(copy_if, include_ext_patterns=include_ext_patterns)
                )

To copy dir and files from src to destination:

copy_files(src="path/to/src/dir", 
           dst="path/to/dest/dir",
           include_ext_patterns=('.dwg', '.dxf'))

Upvotes: 0

Jan
Jan

Reputation: 952

shutil already contains a function ignore_patterns, so you don't have to provide your own. Straight from the documentation:

from shutil import copytree, ignore_patterns

copytree(source, destination, ignore=ignore_patterns('*.pyc', 'tmp*'))

This will copy everything except .pyc files and files or directories whose name starts with tmp.

It's a bit tricky (and not strictly necessary) to explain what's going on: ignore_patterns returns a function _ignore_patterns as its return value, this function gets stuffed into copytree as a parameter, and copytree calls this function as needed, so you don't have to know or care how to call this function _ignore_patterns. It just means that you can exclude certain unneeded cruft files (like *.pyc) from being copied. The fact that the name of the function _ignore_patterns starts with an underscore is a hint that this function is an implementation detail you may ignore.

copytree expects that the folder destination doesn't exist yet. It is not a problem that this folder and its subfolders come into existence once copytree starts to work, copytree knows how to handle that.

Now include_patterns is written to do the opposite: ignore everything that's not explicitly included. But it works the same way: you just call it, it returns a function under the hood, and coptytree knows what to do with that function:

copytree(source, destination, ignore=include_patterns('*.dwg', '*.dxf'))

Upvotes: 59

Related Questions