Reputation: 2435
I have a process that adds up a bunch of numbers:
def slow(x):
num = 0
for i in xrange(int(1E9)):
num += 1
And I start 500 of these.
for x in range(500):
out.write("Starting slow process - " + str(datetime.now()) + "\n")
p = multiprocessing.Process(target = slow, args = (x, ))
p.start()
I would expect the processes to start all at once, since the maximum number of processes allowed on my computer is greater than 500.
user@computer$ cat /proc/sys/kernel/pid_max
32768
However, there's a brief delay between the start time of one process and the start time of the next process.
Starting slow process - 2015-05-14 16:41:35.276839
Starting slow process - 2015-05-14 16:41:35.278016
Starting slow process - 2015-05-14 16:41:35.278666
Starting slow process - 2015-05-14 16:41:35.279328
Starting slow process - 2015-05-14 16:41:35.280053
Starting slow process - 2015-05-14 16:41:35.280751
Starting slow process - 2015-05-14 16:41:35.281444
Starting slow process - 2015-05-14 16:41:35.282094
Starting slow process - 2015-05-14 16:41:35.282720
Starting slow process - 2015-05-14 16:41:35.283364
And this delay gets longer as we start more processes:
Starting slow process - 2015-05-14 16:43:40.572051
Starting slow process - 2015-05-14 16:43:41.630004
Starting slow process - 2015-05-14 16:43:42.716438
Starting slow process - 2015-05-14 16:43:43.270189
Starting slow process - 2015-05-14 16:43:44.336397
Starting slow process - 2015-05-14 16:43:44.861934
Starting slow process - 2015-05-14 16:43:45.948424
Starting slow process - 2015-05-14 16:43:46.514324
Starting slow process - 2015-05-14 16:43:47.516960
Starting slow process - 2015-05-14 16:43:48.051986
Starting slow process - 2015-05-14 16:43:49.145923
Starting slow process - 2015-05-14 16:43:50.228910
Starting slow process - 2015-05-14 16:43:50.236215
What might account for this phenomenon?
Upvotes: 1
Views: 166
Reputation: 75629
Here are some changes to your code based on @Agrajag's suggestions, which at least on my system, confirm his suspicions.
out
in the middle of spawning.import sys
import time
import multiprocessing
from datetime import datetime
def slow(x):
time.sleep(10)
num = 0
for i in xrange(int(1E9)):
num += 1
times = []
for x in range(500):
times.append(datetime.now())
p = multiprocessing.Process(target = slow, args = (x, ))
p.start()
for x in times:
sys.stdout.write("Starting slow process - " + str(x) + "\n")
Starting slow process - 2015-05-18 04:17:02.557117
Starting slow process - 2015-05-18 04:17:02.574186
Starting slow process - 2015-05-18 04:17:02.594736
Starting slow process - 2015-05-18 04:17:02.616716
Starting slow process - 2015-05-18 04:17:02.637369
Starting slow process - 2015-05-18 04:17:02.658615
Starting slow process - 2015-05-18 04:17:02.675418
Starting slow process - 2015-05-18 04:17:02.696439
Starting slow process - 2015-05-18 04:17:02.713795
Starting slow process - 2015-05-18 04:17:02.734777
Starting slow process - 2015-05-18 04:17:02.753063
Upvotes: 2
Reputation: 18375
Your computer doesn't really like to run more processes than there are CPU cores. Normally, it's not a big deal, because no one process is hogging the CPU. The operating can happily allocate resources to each process in turn according to the rules of its process scheduler.
When lots of processes really need the CPU, bad things start to happen. The operating system does its best, but things are likely to slow down. None of the jobs are able to complete their task efficiently.
As you add more active processes, things get worse. Why does that happen?
Well, one factor - among several - is that the CPU caches are probably going to have stale data in them when a new process takes over. CPUs have several levels of cache that act as super fast memory. If a long running process gets to have sole access of a CPU, it will enjoy much faster speeds because it will have the cache all to itself.
When there are many more processes than CPUs, some of those processes just wait in the queue. When the OS allocates the process CPU time, more memory will be loaded, etc etc, slowing everything down for the next person.
Oh - and let's not forget that spawning processes is not instantaneous either. The operating system has other jobs to do, like ensuring you have access to the Internet and checking that files are being written to disk.
Upvotes: 1
Reputation: 1034
You are starting 500 processes; each of which you're asking to spin by counting to a million. I'm not sure why it surprise you that this takes time ?
Starting 500 processes would take a bit of time even if they did nothing, but when each of them uses python to count to a million, it's pretty much a given that a second or two will elapse. These other processes will now compete for CPU-time and it's not a given that the process that does the spawning wins this race and gets to spawn the rest immediately.
Edit: you're also doing 500 calls to the system to get the time now and print it, this takes some amount of time too, if you printed the time only when you start and when you're done spawning, I suspect that'd speed it up too.
I suspect that this would go quicker if you replaced the counting-loop with a call to sleep or something of that nature, and thus that what you're seeing is not really just the time to start processes at all.
Upvotes: 3