Reputation: 43
I have directory containing multiple subdirectories of different scraper. How would you go about writing script that will cd into each of the subdirectories and run the scraper, cd out then continue to the next one what would be the best way to do this if it possible?
Example of the how the directory looks:
- All_Scrapers (parent dir)
- Scraper_one (sub dir folder)
- scraper.py
- Scraper_two (sub dir folder)
- scraper.py
- Scraper_three (sub dir folder)
- scraper.py
- all.py
all the scrapers have main function
if __name__ == "__main__":
main()
Upvotes: 0
Views: 1003
Reputation: 26886
One way of doing this is to walk through your directories and programmactically import the modules you need.
Assuming that the Scraper X folder
s are in the same subdirectory scrapers
and you have the batch_run.py
script in the directory containing scrapers
(hence, at the same path level), the following script will do the trick:
import os
import importlib
base_subdir = 'scrapers'
for root, subdirs, filenames in os.walk(base_subdir):
for subdir in subdirs:
if not subdir.startswith('__'):
print(root, subdir)
submodule = importlib.import_module('.'.join((root, subdir, 'scraper')))
submodule.main()
If the script is inside the base_subdir
path, the code can be adapted by changing a bit how the import_module()
is called.
import os
import importlib
base_subdir = '.'
for root, subdirs, filenames in os.walk(base_subdir):
for subdir in subdirs:
if not subdir.startswith('__'):
print(root, subdir)
script = importlib.import_module('.'.join((subdir, 'scraper')), root)
script.main()
Some explanations:
import_module()
is being used?The import_module()
line, is what is actually doing the job. Roughly speaking, when it is used with only one argument, i.e.
alias = importlib.import_module("my_module.my_submodule")
it is equivalent to:
import my_module.my_submodule as alias
Instead, when used with two argumens, i.e.
alias = importlib.import_module("my_submodule", "my_module")
it is equivalent to:
from my_module import my_submodule as alias
This second form is very convenient for relative imports (i.e. imports using .
or ..
special directories).
if not subdir.startswith('__'):
doing?When you import a module, Python will generate some bytecode to be interpreted and it will cache the result as .pyc
files under the __cache__
directory. The aforementioned line will avoid that, when walking through the directories, __cache__
(actually, any directory starting with __
) will be processed as if it would contain modules to import. Other kind of filtering may be equally valid.
Upvotes: 3
Reputation: 16660
You may want to check os.walk
function that traverses the directory tree and at each directory run the script (or the main
function that you can wrap the contents of the script into).
An example code would be:
import os
for root, dirs, files in os.walk(".", topdown=False):
scraper_main()
Upvotes: 0