Reputation: 197
I want to do some parallel computing for the first time and I don't know exactly where should I start from.
The problem is that I have a huge file list (around 7000 csv files) that I want to process and get a single file from the data. For this task I would like to use the campus' cluster which works with Torque PBS.
The closest question to what I want to achive that I've found in SO so far is this one. With the main difference that I should use Torque (do I really?).
So, to leave it short, my question would be: How could I implement the solution of the cited question using Torque PBS?
Upvotes: 1
Views: 438
Reputation: 197
Well, I managed to do it the following way:
Assuming there's a python serial process named process.py
which handles 100 of the csv files at a time.
Then we need a file call_pyprocess.pbs
which calls the process.py
with the following syntax:
#!/bin/bash
#PBS -l nodes=1:ppn=1
#PBS -o out.varx
#PBS -e error.varx
source activate p2.7 """ if need to specify python environment """
python /path/to/file/process.py varx """ varx is the iteration number """
Note that the process.py
file require an argument parser in order to use varx
as an internal variable.
Then the job is sent with the following command from bash:
for i in {00..70} ; do cp call_pyprocess.pbs temp.pbs ;
perl -pi -e "s/varx/$i/" temp.pbs; qsub temp.pbs; done
Upvotes: 1