Torque PBS - running several serial Python processes

Question

I want to do some parallel computing for the first time and I don't know exactly where should I start from.

The problem is that I have a huge file list (around 7000 csv files) that I want to process and get a single file from the data. For this task I would like to use the campus' cluster which works with Torque PBS.

The closest question to what I want to achive that I've found in SO so far is this one. With the main difference that I should use Torque (do I really?).

So, to leave it short, my question would be: How could I implement the solution of the cited question using Torque PBS?

Melqu&#237;ades Ochoa · Accepted Answer

Well, I managed to do it the following way:

Assuming there's a python serial process named process.py which handles 100 of the csv files at a time.

Then we need a file call_pyprocess.pbs which calls the process.py with the following syntax:

#!/bin/bash
#PBS -l nodes=1:ppn=1
#PBS -o out.varx
#PBS -e error.varx

source activate p2.7    """ if need to specify python environment  """

python /path/to/file/process.py varx   """ varx is the iteration number """

Note that the process.py file require an argument parser in order to use varx as an internal variable.

Then the job is sent with the following command from bash:

for i in {00..70} ; do cp call_pyprocess.pbs temp.pbs ;
 perl -pi -e "s/varx/$i/" temp.pbs; qsub temp.pbs; done

Torque PBS - running several serial Python processes

Answers (1)

Related Questions