umbe1987
umbe1987

Reputation: 3588

Using GNU Parallel with a variable defined within the terminal session

I am trying to launch an Octave script in parallel using GNU Parallel. Everything works fine, but I have a question regarding exported variables. My workflow before using GNU Parallel was to open a terminal, do export OMP_NUM_THREADS=1, and then execute my Octave script. This way I allocate 1 thread to BLAS, which is used by Octave. When using GNU Parallel, is doing export OMP_NUM_THREADS=1 before using GNU Parallel enough or should I do anything differently? I read about env_parallel but I am not sure whether I need it or not for my use case, and how to use it in case I do.

This is what I do without GNU Parallel (open a terminal and):

export OMP_NUM_THREADS=1
octave--gui

This is what I am doing now with GNU Parallel (open a terminal then):

export OMP_NUM_THREADS=1
readlink -f ./data/*.csv | parallel "octave validation.m {}"

Basically I am trying to process the CSV files within a directory in parallel using validation.m and I would like to make sure BLAS is only using 1 thread.

Upvotes: 1

Views: 51

Answers (2)

chrslg
chrslg

Reputation: 13491

As far as I understand, the point of env_parallel is to export local shell variables, aliases, functions, options, etc. to what is executed in parallel.

So, it makes sense only when what you want to parallelize is shell code (and only if that shell code is not just a subprocess that works standalone, so a code that happens to be written in bash, but that is just a opaque program from the parallelization point of view).

If what you want to do with parallel, is just to run some opaque programs (I mean that you don't intend to bother with internals of octave and alter its variables and functions — and anyway, even if you wanted to, since octave is not bash code, env_parallel wouldn't help), env_parallel has nothing to do with your problem.

At least, that is what I understand from the examples of the link you gave : they are all about "how to find my functions, alias, local shell variables, etc. when I use parallel to fork some bash code".

Since octave is written in C++ (I think. At least the main program is) that would be the kind of tool you would want to parallelize some C++ code while sharing some C++ variables, classes, functions, etc. Except that env_parallel would not work, of course, for that, since it does that only for shell code.

So, in your case, whatever the language of octave is coded with, you don't really care with its internal. You just happens to know that at some point, it will do a getenv("OMP_NUM_THREADS"), or whatever it is called in its language, and you want to be sure of what would be the result. And, as tripleee already said, that is what environment variables are for. As long as octave processes inherit from a process in which you set up environment variable OMP_NUM_THREADS, that getenv will get that.

In other words, your export OMP_NUM_THREADS=1 before calling plain parallel octave is all you need. env_parallel is not for that.

Upvotes: 0

tripleee
tripleee

Reputation: 189789

export variable=value will set variable to value and mark it for exporting to subprocesses. Those include parallel and octave and anything else you run from within that shell (barring corner cases like running env to override what's otherwise in the environment).

In so many words, the exported variable is visible to all descendants (child processes, and their children, etc) of the environment where it was set.

Perhaps read up on the Unix process model if you need more details; but this is not very complicated.

Upvotes: 3

Related Questions