Reputation: 6217
Is this a cache thing, as timeit suggests?
In [55]: timeit a = zeros((10000, 400))
100 loops, best of 3: 3.11 ms per loop
In [56]: timeit a = zeros((10000, 500))
The slowest run took 13.43 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.43 µs per loop
Tried to fool it, but it didn't work:
In [58]: timeit a = zeros((10000, 500+random.randint(100)))
The slowest run took 13.31 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.35 µs per loop
Upvotes: 2
Views: 71
Reputation: 152637
The reason is not caching but that numpy just creates a placeholder instead of the full array. This can be easily verified by monitoring your RAM usage when you do something like this:
a = np.zeros((20000, 20000), np.float64)
This doesn't allocate 20k*20k*8byte ~ 3GB on my computer (but might be OS-dependant because np.zeros
uses the C function calloc
). But be careful because most operations on this array (for example a += 5
) will immediatly allocate that memory! Make sure you use an appropriate size compared to your RAM so that you'll notice the RAM increase without overusing it.
In the end this just postpones the allocation of the array, as soon as you operate with it the combined timing of allocation and operation should be as expected (linear with the number of elements). It seems you're using IPython so you can use a block-timeit %%timeit
:
%%timeit
a = np.zeros((10000, 400))
a += 10
# => 10 loops, best of 3: 30.3 ms per loop
%%timeit
a = np.zeros((10000, 800))
a += 10
# => 10 loops, best of 3: 60.2 ms per loop
Upvotes: 2