Reputation: 79
I have some code that looks at a single folder and pulls out files. but now the folder structure has changed and i need to trawl throught the folders looking for files that match.
what the old code looks like
GSB_FOLDER = r'D:\Games\Gratuitous Space Battles Beta'
def get_module_data():
module_folder = os.path.join(GSB_FOLDER, 'data', 'modules')
filenames = [os.path.join(module_folder, f) for f in
os.listdir(module_folder)]
data = [parse_file(f) for f in filenames]
return data
But now the folder structure has changed to be like this
where folder1,2 or 3, could be any text string
how do i rewrite the code above to do this... I have been told about os.walk but I'm just learning Python... so any help appreciated
Upvotes: 6
Views: 21272
Reputation: 699
Created a function that kind of serves a general purpose of crawling through directory structure and returning files and/or paths that match pattern.
import os
import re
def directory_spider(input_dir, path_pattern="", file_pattern="", maxResults=500):
file_paths = []
if not os.path.exists(input_dir):
raise FileNotFoundError("Could not find path: %s"%(input_dir))
for dirpath, dirnames, filenames in os.walk(input_dir):
if re.search(path_pattern, dirpath):
file_list = [item for item in filenames if re.search(file_pattern,item)]
file_path_list = [os.path.join(dirpath, item) for item in file_list]
file_paths += file_path_list
if len(file_paths) > maxResults:
break
return file_paths[0:maxResults]
Example usages:
Upvotes: 2
Reputation: 88865
Nothing much changes you just call os.walk
and it will recursively go thru the directory and return files e.g.
for root, dirs, files in os.walk('/tmp'):
if os.path.basename(root) != 'modules':
continue
data = [parse_file(os.path.join(root,f)) for f in files]
Here I am checking files only in folders named 'modules' you can change that check to do something else, e.g. paths which have module somewhere root.find('/modules') >= 0
Upvotes: 10
Reputation: 10650
os.walk is a nice easy way to get the directory structure of everything inside a dir you pass it;
in your example, you could do something like this:
for dirpath, dirnames, filenames in os.walk("...GSB_FOLDER"):
#whatever you want to do with these folders
if "/data/modules/" in dirpath:
print dirpath, dirnames, filenames
try that out, should be fairly self explanatory how it works...
Upvotes: 2
Reputation: 5604
You can use os.walk
like @Anurag has detailed or you can try my small pathfinder
library:
data = [parse_file(f) for f in pathfinder.find(GSB_FOLDER), just_files=True]
Upvotes: 0