egouden
egouden

Reputation: 43

performance issue : big numpy array and system call

There is a performance issue when using a system call after pre-allocating big amount of memory (e.g. numpy array). The issue grows with the amount of memory.

test.py :

import os
import sys
import time
import numpy

start = time.clock()
test = int(sys.argv[1])
a = numpy.zeros((test,500,500))
for i in range(test) :
    os.system("echo true > /dev/null")
elapsed = (time.clock() - start)
print(elapsed)

The per-iteration time increases dramatically :

edouard@thorin:~/now3/code$ python test.py 100
0.64
edouard@thorin:~/now3/code$ python test.py 200
2.09
edouard@thorin:~/now3/code$ python test.py 400
14.26

This should not be related to virtual memory. Is it a known issue?

Upvotes: 3

Views: 567

Answers (2)

NPE
NPE

Reputation: 500167

You seem to have narrowed the problem down to os.system() taking longer after you've allocated a large NumPy array.

Under the covers, system() uses fork(). Even though fork() is supposed to be very cheap (due to its use of copy-on-write), it turns out that things are not quite as simple.

In particular, there are known issues with Linux's fork() taking longer for larger processes. See, for example:

Both documents are fairly old, so I am not sure what the state of the art is. However, the evidence suggests that you've encountered an issue of this sort.

If you can't get rid of those system() calls, I would suggest two avenues of research:

  • Look into enabling huge pages.
  • Consider the possibility of spawning an auxiliary process on startup, whose job would be to invoke the necessary system() commands.

Upvotes: 5

mgilson
mgilson

Reputation: 309821

What happens if you don't use the os.system call?

for me:

python test.py 10   # 0.14
python test.py 100  # 1.18
python test.py 1000 # 11.77

It grows approximately an order of magnitide each time without os.system. So, I'd say your problem is in the system call, not the performance of numpy (This is confirmed by doing the same test over except this time commenting out the numpy portion of the code). At this point, the question becomes "Why is it slow(er) to do repeated system calls?" ... Unfortunately, I don't have an answer for that.

Interestingly enough, If I do this in bash, there is no problem (it returns almost immediately)...

time for i in `seq 1 1000`; do echo true > /dev/null; done

It also seems that the problem isn't just os.system -- subprocess.Popen suffers the same mality... (although, subprocess may just call os.system under the hood, I don't actually know...)

EDIT

This is getting better and better. In my previous tests, I was leaving the allocation of the numpy array ... If you remove the allocation of the numpy array also, the test goes relatively fast. However, the allocation of the array (1000,800,800) only takes ~1 second. So, the allocation isn't taking all (or even much of the time) and the assignment of data to the array doesn't take much time either, but the allocation status of the array does effect how long it takes for the system call to execute. Very weird.

Upvotes: 4

Related Questions