Liban West
Liban West

Reputation: 43

Run multiple python files in all subdirectories

I have directory containing multiple subdirectories of different scraper. How would you go about writing script that will cd into each of the subdirectories and run the scraper, cd out then continue to the next one what would be the best way to do this if it possible?

Example of the how the directory looks:

- All_Scrapers (parent dir)
   - Scraper_one (sub dir folder)
       - scraper.py
   - Scraper_two (sub dir folder)
       - scraper.py
   - Scraper_three (sub dir folder)
       - scraper.py
   - all.py

all the scrapers have main function

 if __name__ == "__main__":
         main()

Upvotes: 0

Views: 1003

Answers (2)

norok2
norok2

Reputation: 26886

One way of doing this is to walk through your directories and programmactically import the modules you need.

Assuming that the Scraper X folders are in the same subdirectory scrapers and you have the batch_run.py script in the directory containing scrapers (hence, at the same path level), the following script will do the trick:

import os
import importlib

base_subdir = 'scrapers'

for root, subdirs, filenames in os.walk(base_subdir):
    for subdir in subdirs:
        if not subdir.startswith('__'):
            print(root, subdir)
            submodule = importlib.import_module('.'.join((root, subdir, 'scraper')))
            submodule.main()

EDIT

If the script is inside the base_subdir path, the code can be adapted by changing a bit how the import_module() is called.

import os
import importlib

base_subdir = '.'

for root, subdirs, filenames in os.walk(base_subdir):
    for subdir in subdirs:
        if not subdir.startswith('__'):
            print(root, subdir)
            script = importlib.import_module('.'.join((subdir, 'scraper')), root)
            script.main()

EDIT 2

Some explanations:

How import_module() is being used?

The import_module() line, is what is actually doing the job. Roughly speaking, when it is used with only one argument, i.e.

alias = importlib.import_module("my_module.my_submodule")

it is equivalent to:

import my_module.my_submodule as alias

Instead, when used with two argumens, i.e.

alias = importlib.import_module("my_submodule", "my_module")

it is equivalent to:

from my_module import my_submodule as alias

This second form is very convenient for relative imports (i.e. imports using . or .. special directories).

What is if not subdir.startswith('__'): doing?

When you import a module, Python will generate some bytecode to be interpreted and it will cache the result as .pyc files under the __cache__ directory. The aforementioned line will avoid that, when walking through the directories, __cache__ (actually, any directory starting with __) will be processed as if it would contain modules to import. Other kind of filtering may be equally valid.

Upvotes: 3

sophros
sophros

Reputation: 16660

You may want to check os.walk function that traverses the directory tree and at each directory run the script (or the main function that you can wrap the contents of the script into).

An example code would be:

import os
for root, dirs, files in os.walk(".", topdown=False):
   scraper_main()

Upvotes: 0

Related Questions