Palmetto_Girl86
Palmetto_Girl86

Reputation: 1007

Python -- "Batch Processing" of multiple existing scripts

I have written three simple scripts (which I will not post here, as they are part of my dissertation research) that are all in working order.

What I would like to do now is write a "batch-processing" script for them. I have many (read as potentially tens of thousands) of data files on which I want these scripts to act.

My questions about this process are as follows:

  1. What is the most efficient way to go about this sort of thing?
  2. I am relatively new to programming. Is there a simple way to do this, or is this a very complex endeavor?

Before anyone downvotes this question as "unresearched" or whatever negative connotation comes to mind, PLEASE just offer help. I have spent days reading documentation and following leads from Google searches, and it would be most appreciated if a human being could offer some input.

Upvotes: 3

Views: 18624

Answers (2)

Ethan Furman
Ethan Furman

Reputation: 69190

If you just need to have the scripts run, probably a shell script would be the easiest thing.

If you want to stay in Python, the best way would be to have a main() (or somesuch) function in each script (and have each script importable), have the batch script import the subscript and then run its main.

If staying in Python: - your three scripts must have the .py ending to be importable - they should either be in Python's search path, or the batch control script can set the path - they should each have a main function (or whatever name you choose) that will activate that script

For example:

batch_script

import sys
sys.path.insert(0, '/location/of/subscripts')

import first_script
import second_script
import third_script

first_script.main('/location/of/files')
second_script.main('/location/of/files')
third_script.main('/location/of/files')

example sub_script

import os
import sys
import some_other_stuff
SOMETHING_IMPORTANT = 'a value'

def do_frobber(a_file):
   ...

def main(path_to_files):
    all_files = os.listdir(path_to_files)
    for file in all_files:
        do_frobber(os.path.join(path_to_files, file)

if __name__ == '__main__':
    main(sys.argv[1])

This way, your subscript can be run on its own, or called from the main script.

Upvotes: 4

Tim B
Tim B

Reputation: 3143

You can write a batch script in python using os.walk() to generate a list of the files and then process them one by one with your existing python programs.

import os, re

for root, dir, file in os.walk(/path/to/files):
    for f in file:
        if re.match('.*\.dat$', f):
            run_existing_script1 root + "/" file
            run_existing_script2 root + "/" file

If there are other files in the directory you might want to add a regex to ensure you only process the files you're interested in.

EDIT - added regular expression to ensure only files ending ".dat" are processed.

Upvotes: 2

Related Questions