Running MPI code in embarrassingly parallel (on PBS-Torque cluster)

Question

I have written an MPI based C-code that I use to perform numerical simulations in parallel. Due to some poor design on my part, I have built in some inherent MPI dependencies into the code (array structures, MPI-IO). This means that if I want to run my code in serial, I have to invoke

mpiexec -n 1 c_exe

Main problem I use my C code within a Python workflow that is simplified in the loop below.

import os 
import subprocess

homedir = os.getenv('PBS_O_WORKDIR')

nevents = 100
for ievent in range(nevents):

    perform_workflow_management()
    os.chdir(ievent)
    subprocess.call('mpiexec -n 1 c_exe', Shell=True)
    os.chdir(homedir)

The Python workflow is primarily for management and makes calls to the C code which performs the numerically intensive work.

The tasks within the Python for loop are independent, consequently I would like to employ an embarrassingly parallel scheme to parallelize the loop over events. Benchmarks indicate that parallelizing the loop over events will be faster than a serial loop with parallel MPI calls. Furthermore, I am running this on a PBS-Torque cluster.

I am at a loss about how to do this effectively. The complication seems to arise due to MPI call to my C code and the assignment of multiple MPI tasks.

Things I have tried in some form

Wrappers to pbsdsh - incur problems with processor assignment.

MPMD with mpiexec - Theoretically does what I would like but fails because all processes seem to share MPI_COMM_WORLD. My C code establishes a cartesian topology for domain based parallelism; conflicts arise here.

Does anyone have suggestions on how I might achieve deploy this in an embarrassingly parallel fashion? Ideally I would like to submit a job request

qsub -l nodes=N:ppn=1,walltime=XX:XX:XX go_python_job.bash

where N is the number of processors. On each process I would then like to be able to submit independent mpiexec calls to my C code.

I'm aware that part of the issue is down to design flaws but if I could find a solution without having to refactor large parts of code that would be advantageous.

Running MPI code in embarrassingly parallel (on PBS-Torque cluster)

Answers (1)

Related Questions