Reputation: 9806
I have to store a list of numpy
matrices. Two approaches I tried:
1.Create a list
and append to it:
ls_ws=[]
for ind in range(iterations):
...
ls_ws.append(ls_w) # ls_w is a matrix of size 6,1
Problem: Fast at start, gets very slow at end.
2.Create a matrix of zeros and modify it:
for ind in range(iterations):
...
ls_ws=np.matrix(np.zeros((6,iterations)))
Problem: I am not very sure but this seems to the same speed irrespective of each iteration it is at.
Weird thing I get, if I set iterations to small number it is fast. But if I set it to large number like 1500 it is very slow for each iteration ( even the first iteration ).
The second approach should be fast, but is very slow if I set iterations to a large number. Why? How to resolve it?
Upvotes: 1
Views: 87
Reputation: 231385
With a barebones version of the list append:
def foo1(N):
alist = []
for _ in range(N):
alist.append([1,2,3,4,5,6])
return np.array(alist)
times scale smoothly with N
In [4]: timeit foo1(100)
10000 loops, best of 3: 123 µs per loop
In [5]: timeit foo1(1000)
1000 loops, best of 3: 1.23 ms per loop
In [6]: timeit foo1(10000)
100 loops, best of 3: 12.3 ms per loop
In [7]: timeit foo1(100000)
10 loops, best of 3: 129 ms per loop
In [8]: timeit foo1(1000000)
1 loop, best of 3: 1.29 s per loop
In [9]: timeit foo1(10000000)
1 loop, best of 3: 12.9 s per loop
List append has the potential of slowing down with large lists. A list uses a buffer to hold the pointers, plus some grow space. When it outgrows that space it has to get more. If it can't expand 'in-place' it will have to ask for new space and copy all pointers. That can slow things down for very large lists that come close to filling memory. I may be seeing hints of that with the move from 123 to 129 in the timings
def foo2(N):
out = np.zeros((N,6),int)
for i in range(N):
out[i,:] = [1,2,3,4,5,6]
return out
Times also scale with N and are double the list case:
In [15]: timeit foo2(100)
1000 loops, best of 3: 242 µs per loop
In [16]: timeit foo2(1000)
100 loops, best of 3: 2.52 ms per loop
In [17]: timeit foo2(10000)
10 loops, best of 3: 24.6 ms per loop
In [18]: timeit foo2(100000)
1 loop, best of 3: 249 ms per loop
In [19]: timeit foo2(1000000)
1 loop, best of 3: 2.52 s per loop
In [20]: timeit foo2(10000000)
1 loop, best of 3: 25.2 s per loop
Iterating on the last dimension makes no difference in times
def foo3(N):
out = np.zeros((6,N),int)
for i in range(N):
out[:,i] = [1,2,3,4,5,6]
return out
Iterating 1000's of times while just assigning/creating 6 elements at a time is poor numpy
practice. If you must iterate it is better to loop a few times, while doing large operations each time. eg 6 loops with 1000 item arrays.
If I move the iteration down to compiled code, times are much faster
def foo4(N):
out = np.zeros((N,6),int)
out[...] = [1,2,3,4,5,6]
return out
In [28]: timeit foo4(1000)
.... cached.
10000 loops, best of 3: 20.4 µs per loop
In [29]: timeit foo4(100000)
1000 loops, best of 3: 1.45 ms per loop
You talk about saving a matrix and even allocate a np.matrix
. So let's see what effect that has:
def foo1m(N):
alist = []
for _ in range(N):
alist.append(np.matrix([1,2,3,4,5,6]).T)
return np.concatenate(alist, axis=1)
def foo2m(N):
out = np.matrix(np.zeros((6,N),int))
for i in range(N):
out[:,i] = np.matrix([1,2,3,4,5,6]).T
return out
In [62]: timeit foo1(1000)
1000 loops, best of 3: 1.32 ms per loop
In [63]: timeit foo1m(1000)
100 loops, best of 3: 18 ms per loop
In [64]: timeit foo2(1000)
100 loops, best of 3: 2.75 ms per loop
In [65]: timeit foo2m(1000)
10 loops, best of 3: 28.2 ms per loop
That makes the per/iteration times 10x longer. matrix
is a subclass of ndarray
, and requires more processing (to force it to 2d etc).
Upvotes: 1
Reputation: 249273
I think here you have almost a good solution, but your treatment of columns vs. rows is probably hobbling the performance:
ls_ws=np.matrix(np.zeros((6,iterations)))
for ind in range(iterations):
...
ls_ws[:,ind]=ls_w
You should be iterating within the first dimension rather than the last. You could swap the dimensions, or just do this:
ls_ws=np.matrix(np.zeros((6,iterations), order='F'))
That may help when the array is large, because you will be accessing contiguous chunks of 6 elements, rather than scattered bits in each iteration.
Upvotes: 5