Clouseau
Clouseau

Reputation: 55

How to loop through each subdirectory and get all files within subdirectory as group in Python?

I need to iterate through folders in a directory, one by one, and perform a function on the files within those folders as a group. Those files need to be passed together, with their respective folder name, into a function. Sometimes, the directories contain only one file, but many times they contain 2 or more. Also, some folders contain more folders that I need to extract the file paths within as a group.

For example, my file structure can look like this:

├── root_folder
│   ├── subdir1
│   │   ├── file1.txt
│   │   ├── file2.txt
│   ├── subdir2
│   │   ├── file3.txt
│   ├── subdir3
│   │   ├── file4.txt
│   │   ├── file5.txt
│   │   ├── file6.txt
│   ├── subdir4
│   |   ├── subdir42
│   │   |   ├── file7.txt
│   │   ├── subdir43
│   │   |   ├── file8.txt
│   │   |   ├── file9.txt

I would like to get all files from each dir, and the directory name to pass into a function. Below are the desired results:

Simply put, I need to loop through each folder recursively, one by one, and use all files within the folder to pass into a function.

My current code only returns all the filepaths for each file, but like I said, I need to use them in groups (where each folder contains a group of files).

indir = "C:/path/to/indir"
for path, directories, files in os.walk(indir):
     for file in files:
        fpath = os.path.join(path, file)
        print(fpath)

Upvotes: 0

Views: 781

Answers (1)

vht981230
vht981230

Reputation: 4480

I think you can create some sort of a recursive function which you can use to traverse the directories and then return the list of non-directory files in a list that you can use to pass into a function

import os

def traverse_directory(path):
    content = []
    files = os.listdir(path)
    for sf in files:
        f = os.path.join(path, sf) 
        if os.path.isdir(f):
            content += traverse_directory(f)
        else:
            content.append(f)
    
    # pass all files with the folder into a function
    # function_call(content)

    return content
    
print(traverse_directory('./root_folder'))
# returns ['./root_folder/sudir4/subdir42/file7.txt', './root_folder/sudir4/subdir43/file8.txt', './root_folder/sudir4/subdir43/file9.txt', './root_folder/sudir3/file4.txt', './root_folder/sudir3/file5.txt', './root_folder/sudir3/file6.txt', './root_folder/sudir2/file3.txt', './root_folder/sudir1/file2.txt', './root_folder/sudir1/file1.txt']

Edit: Based on your update expected function calls, I think the similar approach I suggested above should work but instead of returning all files in subdirectories, it would just get all the non-directory files in the same directory and pass as argument for the function call

import os

# sample myfunc to print out the input files and directory
def myfunc(*args):
    files, dr = args[:-1], args[-1]
    print('myfunc:', files, dr)

def traverse_directory(path):
    content = []
    files = os.listdir(path)
    for sf in files:
        f = os.path.join(path, sf) 
        if os.path.isdir(f):
            traverse_directory(f)
        else:
            content.append(sf)
    
    if content:
        content.append(path.split('/')[-1])
        myfunc(*content)

traverse_directory('./root_folder')

Running your sample data structure above should give the following output

myfunc: ('file7.txt',) subdir42
myfunc: ('file8.txt', 'file9.txt') subdir43
myfunc: ('file4.txt', 'file5.txt', 'file6.txt') sudir3
myfunc: ('file3.txt',) sudir2
myfunc: ('file2.txt', 'file1.txt') sudir1

Upvotes: 1

Related Questions