Reputation: 55
I need to iterate through folders in a directory, one by one, and perform a function on the files within those folders as a group. Those files need to be passed together, with their respective folder name, into a function. Sometimes, the directories contain only one file, but many times they contain 2 or more. Also, some folders contain more folders that I need to extract the file paths within as a group.
For example, my file structure can look like this:
├── root_folder
│ ├── subdir1
│ │ ├── file1.txt
│ │ ├── file2.txt
│ ├── subdir2
│ │ ├── file3.txt
│ ├── subdir3
│ │ ├── file4.txt
│ │ ├── file5.txt
│ │ ├── file6.txt
│ ├── subdir4
│ | ├── subdir42
│ │ | ├── file7.txt
│ │ ├── subdir43
│ │ | ├── file8.txt
│ │ | ├── file9.txt
I would like to get all files from each dir, and the directory name to pass into a function. Below are the desired results:
myfunc("file1.txt","file2.txt",subdir1)
myfunc("file3.txt",subdir2)
myfunc("file4.txt","file5.txt","file6.txt",subdir3)
myfunc("file7.txt",subdir42)
myfunc("file8.txt","file9.txt",subdir43)
Simply put, I need to loop through each folder recursively, one by one, and use all files within the folder to pass into a function.
My current code only returns all the filepaths for each file, but like I said, I need to use them in groups (where each folder contains a group of files).
indir = "C:/path/to/indir"
for path, directories, files in os.walk(indir):
for file in files:
fpath = os.path.join(path, file)
print(fpath)
Upvotes: 0
Views: 781
Reputation: 4480
I think you can create some sort of a recursive function which you can use to traverse the directories and then return the list of non-directory files in a list that you can use to pass into a function
import os
def traverse_directory(path):
content = []
files = os.listdir(path)
for sf in files:
f = os.path.join(path, sf)
if os.path.isdir(f):
content += traverse_directory(f)
else:
content.append(f)
# pass all files with the folder into a function
# function_call(content)
return content
print(traverse_directory('./root_folder'))
# returns ['./root_folder/sudir4/subdir42/file7.txt', './root_folder/sudir4/subdir43/file8.txt', './root_folder/sudir4/subdir43/file9.txt', './root_folder/sudir3/file4.txt', './root_folder/sudir3/file5.txt', './root_folder/sudir3/file6.txt', './root_folder/sudir2/file3.txt', './root_folder/sudir1/file2.txt', './root_folder/sudir1/file1.txt']
Edit: Based on your update expected function calls, I think the similar approach I suggested above should work but instead of returning all files in subdirectories, it would just get all the non-directory files in the same directory and pass as argument for the function call
import os
# sample myfunc to print out the input files and directory
def myfunc(*args):
files, dr = args[:-1], args[-1]
print('myfunc:', files, dr)
def traverse_directory(path):
content = []
files = os.listdir(path)
for sf in files:
f = os.path.join(path, sf)
if os.path.isdir(f):
traverse_directory(f)
else:
content.append(sf)
if content:
content.append(path.split('/')[-1])
myfunc(*content)
traverse_directory('./root_folder')
Running your sample data structure above should give the following output
myfunc: ('file7.txt',) subdir42
myfunc: ('file8.txt', 'file9.txt') subdir43
myfunc: ('file4.txt', 'file5.txt', 'file6.txt') sudir3
myfunc: ('file3.txt',) sudir2
myfunc: ('file2.txt', 'file1.txt') sudir1
Upvotes: 1