Mat Gritt
Mat Gritt

Reputation: 79

os.walk to crawl through folder structure

I have some code that looks at a single folder and pulls out files. but now the folder structure has changed and i need to trawl throught the folders looking for files that match.

what the old code looks like

GSB_FOLDER = r'D:\Games\Gratuitous Space Battles Beta' 

def get_module_data():
    module_folder = os.path.join(GSB_FOLDER, 'data', 'modules')

    filenames = [os.path.join(module_folder, f) for f in
                  os.listdir(module_folder)]

    data = [parse_file(f) for f in filenames]

    return data

But now the folder structure has changed to be like this

where folder1,2 or 3, could be any text string

how do i rewrite the code above to do this... I have been told about os.walk but I'm just learning Python... so any help appreciated

Upvotes: 6

Views: 21272

Answers (4)

synaptikon
synaptikon

Reputation: 699

Created a function that kind of serves a general purpose of crawling through directory structure and returning files and/or paths that match pattern.

import os
import re

def directory_spider(input_dir, path_pattern="", file_pattern="", maxResults=500):
    file_paths = []
    if not os.path.exists(input_dir):
        raise FileNotFoundError("Could not find path: %s"%(input_dir))
    for dirpath, dirnames, filenames in os.walk(input_dir):
        if re.search(path_pattern, dirpath):
            file_list = [item for item in filenames if re.search(file_pattern,item)]
            file_path_list = [os.path.join(dirpath, item) for item in file_list]
            file_paths += file_path_list
            if len(file_paths) > maxResults:
                break
    return file_paths[0:maxResults]

Example usages:

  • directory_spider('/path/to/find') --> Finds the top 500 files in the path if it exists
  • directory_spider('/path/to/find',path_pattern="",file_pattern=".py$", maxResults=10)

Upvotes: 2

Anurag Uniyal
Anurag Uniyal

Reputation: 88865

Nothing much changes you just call os.walk and it will recursively go thru the directory and return files e.g.

for root, dirs, files in os.walk('/tmp'):
    if os.path.basename(root) != 'modules':
        continue
    data = [parse_file(os.path.join(root,f)) for f in files]

Here I am checking files only in folders named 'modules' you can change that check to do something else, e.g. paths which have module somewhere root.find('/modules') >= 0

Upvotes: 10

will
will

Reputation: 10650

os.walk is a nice easy way to get the directory structure of everything inside a dir you pass it;

in your example, you could do something like this:

for dirpath, dirnames, filenames in os.walk("...GSB_FOLDER"):
  #whatever you want to do with these folders
  if "/data/modules/" in dirpath:
    print dirpath, dirnames, filenames

try that out, should be fairly self explanatory how it works...

Upvotes: 2

John Keyes
John Keyes

Reputation: 5604

You can use os.walk like @Anurag has detailed or you can try my small pathfinder library:

data = [parse_file(f) for f in pathfinder.find(GSB_FOLDER), just_files=True]

Upvotes: 0

Related Questions