Aaron Perry
Aaron Perry

Reputation: 1041

Multithreading/Parallel Bash Scripts in Unix Environment

I have multiple bash scripts that I have tried to "parallelize" within a master bash script.

Bash Script:

#!/bin/bash
SHELL=/bin/bash

bash /home/.../a.sh &
bash /home/.../b.sh &
wait
bash /home/.../c.sh &
bash /home/.../d.sh &
bash /home/.../e.sh &
wait
echo "Done paralleling!"
exit 0

I have run the script normally (without ampersands) and with ampersands and I am not seeing any appreciable difference in processing time, leading me to believe that something may not be coded correctly/the most efficient way.

Upvotes: 1

Views: 439

Answers (2)

Mike Robinson
Mike Robinson

Reputation: 8945

In classic computer-science theory, resource-contention is referred to as "thrashing."

(In the good ol' days, when a 5-megabyte disk drive might be the size of a small washing machine, we used to call it "Maytag Mode," since the poor thing looked like a Maytag washing-machine on the "spin" cycle!)

If you graph the performance curve caused by contention, it slopes upward, then abruptly has an "elbow" shape: it goes straight up, exponentially. We call that, "hitting the wall."

An interesting thing to fiddle-around-with on this script (if you're just curious ...) is to put wait statements at several places. (Be sure you're doing this correctly ...) Allow, say, two instances to run, wait for all of them to complete, then three more, and so on. See if that's usefully faster, and, if it is, try three. And so on. You may find a "sweet spot."

Or ... not. (Don't spend too much time with this. It doesn't look like it's going to be worth it.)

Upvotes: 1

Sobrique
Sobrique

Reputation: 53478

You're likely correct. The thing with parallelism is that it allows you to grab multiple resources to use in parallel. That improves your speed if - and only if - that resource is your limiting factor.

So - for example - if you're reading from a disk - odds are good that the action of reading from disk is what's limiting you, and doing more in parallel doesn't help - and indeed, because of contention can slow the process down. (The disk has to seek to service multiple processes, rather than just 'getting on' and serialising a read).

So it really does boil down to what your script actually does and why it's slow. And the best way of checking that is by profiling it.

At a basic level, something like truss or strace might help.

e.g.

strace -fTtc /home/../e.sh

And see what types of system calls are being made, and how much of the total time they're consuming.

Upvotes: 1

Related Questions