Reputation: 125
I'm looking to optimize a program I've written and I'm really coming across some stumbling blocks. I have so many questions, I don't know where to begin but, for starters, I'll try to keep it simplified to an obstacle I can't seem to overcome.
The code I'm writing is a small schedule generator for work which requires 24/7 coverage. Each shift covers a two-week time span (some shifts rotate over a two week period but the coverage requirements must be maintained - which is why I have to use 14 days). As of right now, I'm trying to figure out the fastest way to check whether a combination of shifts adds up to the right number of people on a given day. I keep hearing Numpy is super fast at this type of stuff but when I run the following:
import numpy as np
import time
c_ar = np.array([1,1,1,1,0,0,0,1,1,1,1,0,0,0])
d_ar = np.array([0,0,0,1,1,1,1,0,0,0,1,1,1,1])
e_ar = np.array([0,0,0,1,1,1,1,0,0,0,1,1,1,1])
m_ar = np.array([0,1,1,0,1,1,0,0,1,1,0,1,1,0])
p_ar = np.array([1,1,0,0,0,1,1,1,1,0,0,0,1,1])
t0 = time.time()
x = c_ar[0] + d_ar[0] + e_ar[0] + m_ar[0] + p_ar[0]
t1 = time.time()
print t1-t0
I get back:
2.19345092773e-05
However, if I run:
c = [1,1,1,1,0,0,0,1,1,1,1,0,0,0]
d = [0,0,0,1,1,1,1,0,0,0,1,1,1,1]
e = [0,0,0,1,1,1,1,0,0,0,1,1,1,1]
m = [0,1,1,0,1,1,0,0,1,1,0,1,1,0]
p = [1,1,0,0,0,1,1,1,1,0,0,0,1,1]
t2 = time.time()
y = c[0] + d[0] + e[0] + m[0] + p[0]
t3 = time.time()
print t3-t2
I get back:
1.90734863281e-06
Am I missing something about Numpy that would make it faster than my example? Also, is there an even faster way than the two methods I used above?
Upvotes: 2
Views: 1973
Reputation: 7612
Probably a lot depends on how you arange your data and what you want to do with it. Converting to a numpy array just for summing up is not necessary. With this setup:
import numpy as np
a = [1,1,1,1,0,0,0,1,1,1,1,0,0,0]
....
l = [0,0,0,1,1,1,1,0,0,0,1,1,1,1]
arr = np.row_stack([a, b, c, d, e, f, g, h, i, j, k, l])
I get with Python
>>> %timeit [sum(col) for col in zip(a, b, c, d, e, f, g, h, i, j, k, l)]
100000 loops, best of 3: 8.84 µs per loop
Numpy
>>> %timeit arr.sum(0)
100000 loops, best of 3: 6.71 µs per loop
I experienced Cython to be faster taking sums of small arrays, but only when working within cython and not calling it often from python. That is if you'd move a bunch of computations to cython you might be better off with a little summation routine than using a numpy routine. A self made cython function sumcols
happens to be slower when called from python
%timeit sumcols(arr)
100000 loops, best of 3: 9.86 µs per loop
Remember that if you'd had long rows, a transposed array might be faster, cause numpy arrays are Row-Major Order by default. It doesn't make a difference in this case though.
>>> arr_T = arr.transpose(1, 0).copy()
>>> %timeit arr_T.sum(1)
100000 loops, best of 3: 6.59 µs per loop
Upvotes: 0
Reputation: 880907
Put all the data into one NumPy array, then call numpy.sum once:
arr.sum(axis=0)
NumPy arrays are no faster than regular Python code when all you use it for is to access individual values item-by-item, as is done here:
c_ar[0] + d_ar[0] + e_ar[0] + m_ar[0] + p_ar[0]
Also, for arrays this small, regular Python code may be faster than using NumPy arrays:
c = [1,1,1,1,0,0,0,1,1,1,1,0,0,0]
d = [0,0,0,1,1,1,1,0,0,0,1,1,1,1]
e = [0,0,0,1,1,1,1,0,0,0,1,1,1,1]
m = [0,1,1,0,1,1,0,0,1,1,0,1,1,0]
p = [1,1,0,0,0,1,1,1,1,0,0,0,1,1]
arr = np.row_stack([c,d,e,m,p])
In [226]: %timeit c[0] + d[0] + e[0] + m[0] + p[0]
10000000 loops, best of 3: 189 ns per loop
In [231]: %timeit arr[:,0].sum()
100000 loops, best of 3: 4.73 µs per loop
In [239]: %timeit [c[i] + d[i] + e[i] + m[i] + p[i] for i in range(len(c))]
100000 loops, best of 3: 3.68 µs per loop
In [240]: %timeit arr.sum(axis=0)
100000 loops, best of 3: 5.04 µs per loop
Upvotes: 1
Reputation: 395893
Don't be so concerned with speed as with readability and directly using the functions as they're given as best as you can. Implementations may vary, but if you're doing semantically the correct thing, in the long run, you've made the right decision. Only optimize at the cost of those things if you profile your code and determine it to be an expensive bottleneck.
>>> np.array([[1, 1], [1, 1], [1, 1]])
array([[1, 1],
[1, 1],
[1, 1]])
>>> np.array([[1,1],[1,1], [1,1]]).sum(axis=0)
array([3, 3])
If you need to retain dimensionality:
>>> np.array([[1,1],[1,1], [1,1]]).sum(axis=0, keepdims=True)
array([[3, 3]])
One reason you might want to do this is to concatenate the sum as a row:
>>> arr = np.array([[1,1],[1,1], [1,1]])
>>> np.vstack((arr, arr.sum(axis=0, keepdims=True)))
array([[1, 1],
[1, 1],
[1, 1],
[3, 3]])
Upvotes: 1