Abhishek Bhatia
Abhishek Bhatia

Reputation: 9806

How to store numpy objects ( matrices ) fast?

I have to store a list of numpy matrices. Two approaches I tried:

1.Create a list and append to it:

ls_ws=[]
for ind in range(iterations):
    ...
    ls_ws.append(ls_w) # ls_w is a matrix of size 6,1

Problem: Fast at start, gets very slow at end.

2.Create a matrix of zeros and modify it:

for ind in range(iterations):
    ...
    ls_ws=np.matrix(np.zeros((6,iterations))) 

Problem: I am not very sure but this seems to the same speed irrespective of each iteration it is at.
Weird thing I get, if I set iterations to small number it is fast. But if I set it to large number like 1500 it is very slow for each iteration ( even the first iteration ).


The second approach should be fast, but is very slow if I set iterations to a large number. Why? How to resolve it?

Upvotes: 1

Views: 87

Answers (2)

hpaulj
hpaulj

Reputation: 231385

With a barebones version of the list append:

def foo1(N):
    alist = []
    for _ in range(N):
        alist.append([1,2,3,4,5,6])
    return np.array(alist)

times scale smoothly with N

In [4]: timeit foo1(100)
10000 loops, best of 3: 123 µs per loop
In [5]: timeit foo1(1000)
1000 loops, best of 3: 1.23 ms per loop
In [6]: timeit foo1(10000)
100 loops, best of 3: 12.3 ms per loop
In [7]: timeit foo1(100000)
10 loops, best of 3: 129 ms per loop
In [8]: timeit foo1(1000000)
1 loop, best of 3: 1.29 s per loop
In [9]: timeit foo1(10000000)
1 loop, best of 3: 12.9 s per loop

List append has the potential of slowing down with large lists. A list uses a buffer to hold the pointers, plus some grow space. When it outgrows that space it has to get more. If it can't expand 'in-place' it will have to ask for new space and copy all pointers. That can slow things down for very large lists that come close to filling memory. I may be seeing hints of that with the move from 123 to 129 in the timings

def foo2(N):
    out = np.zeros((N,6),int)
    for i in range(N):
        out[i,:] = [1,2,3,4,5,6]
    return out

Times also scale with N and are double the list case:

In [15]: timeit foo2(100)
1000 loops, best of 3: 242 µs per loop
In [16]: timeit foo2(1000)
100 loops, best of 3: 2.52 ms per loop
In [17]: timeit foo2(10000)
10 loops, best of 3: 24.6 ms per loop
In [18]: timeit foo2(100000)
1 loop, best of 3: 249 ms per loop
In [19]: timeit foo2(1000000)
1 loop, best of 3: 2.52 s per loop
In [20]: timeit foo2(10000000)
1 loop, best of 3: 25.2 s per loop

Iterating on the last dimension makes no difference in times

def foo3(N):
    out = np.zeros((6,N),int)
    for i in range(N):
        out[:,i] = [1,2,3,4,5,6]
    return out

Iterating 1000's of times while just assigning/creating 6 elements at a time is poor numpy practice. If you must iterate it is better to loop a few times, while doing large operations each time. eg 6 loops with 1000 item arrays.

If I move the iteration down to compiled code, times are much faster

def foo4(N):
    out = np.zeros((N,6),int)
    out[...] = [1,2,3,4,5,6]
    return out

In [28]: timeit foo4(1000)
.... cached.
10000 loops, best of 3: 20.4 µs per loop
In [29]: timeit foo4(100000)
1000 loops, best of 3: 1.45 ms per loop

You talk about saving a matrix and even allocate a np.matrix. So let's see what effect that has:

def foo1m(N):
    alist = []
    for _ in range(N):
        alist.append(np.matrix([1,2,3,4,5,6]).T)
    return np.concatenate(alist, axis=1)

def foo2m(N):
    out = np.matrix(np.zeros((6,N),int))
    for i in range(N):
        out[:,i] = np.matrix([1,2,3,4,5,6]).T
    return out

In [62]: timeit foo1(1000)
1000 loops, best of 3: 1.32 ms per loop
In [63]: timeit foo1m(1000)
100 loops, best of 3: 18 ms per loop
In [64]: timeit foo2(1000)
100 loops, best of 3: 2.75 ms per loop
In [65]: timeit foo2m(1000)
10 loops, best of 3: 28.2 ms per loop

That makes the per/iteration times 10x longer. matrix is a subclass of ndarray, and requires more processing (to force it to 2d etc).

Upvotes: 1

John Zwinck
John Zwinck

Reputation: 249273

I think here you have almost a good solution, but your treatment of columns vs. rows is probably hobbling the performance:

ls_ws=np.matrix(np.zeros((6,iterations))) 
for ind in range(iterations):
    ...
    ls_ws[:,ind]=ls_w

You should be iterating within the first dimension rather than the last. You could swap the dimensions, or just do this:

ls_ws=np.matrix(np.zeros((6,iterations), order='F')) 

That may help when the array is large, because you will be accessing contiguous chunks of 6 elements, rather than scattered bits in each iteration.

Upvotes: 5

Related Questions