N. Gene Eer
N. Gene Eer

Reputation: 39

Python Numpy array is slow(er) than list

I'm working with a 2d array. Basically just trying to do an element wise addition of a constant value. Need to speed code up so attempted to use numpy array instead of list of list but finding numpy to be slower. Any idea of what I'm doing wrong? Thanks.

For example:

import time
import numpy as np

my_array_list = [[1,2,3],[4,5,6],[7,8,9]]
my_array_np = np.array(my_array_list)

n = 100000

s_np = time.time()
for a in range(n):
    for i in range(3):
        for j in range(3):
            my_array_np[i,j] = my_array_np[i,j] + 5
end_np = time.time() - s_np  

s_list = time.time()
for a in range(n):
    for i in range(3):
        for j in range(3):
            my_array_list[i][j] = my_array_list[i][j] + 5
end_list = time.time() - s_list 

print('my_array_np:', '\n', my_array_np, '\n')
print('my_array_list:', '\n',my_array_list, '\n')

print('time to complete with numpy:', end_np)
print('time to complete with list:', end_list)

Output:

my_array_np: 
 [[500001 500002 500003]
 [500004 500005 500006]
 [500007 500008 500009]] 

my_array_list: 
 [[500001, 500002, 500003], [500004, 500005, 500006], [500007, 500008, 500009]] 

time to complete with numpy: 0.7831366062164307
time to complete with list: 0.45527076721191406

Can see with this test using lists, the time to complete is significantly faster, ie, 0.45 vs 0.78 seconds. Should not numpy be significantly faster here?

Upvotes: 2

Views: 1995

Answers (2)

hpaulj
hpaulj

Reputation: 231325

Let's say you want to add something to all elements that are multiples of 3. Instead of iterating on all elements of the array, we would normally use a mask

In [355]: x = np.arange(12).reshape(3,4)                                                       
In [356]: mask = (x%3)==0                                                                      
In [357]: mask                                                                                 
Out[357]: 
array([[ True, False, False,  True],
       [False, False,  True, False],
       [False,  True, False, False]])
In [358]: x[mask] += 100                                                                       
In [359]: x                                                                                    
Out[359]: 
array([[100,   1,   2, 103],
       [  4,   5, 106,   7],
       [  8, 109,  10,  11]])

Many operations are ufunc, which have a where parameter

In [360]: x = np.arange(12).reshape(3,4)                                                       
In [361]: np.add(x,100, where=mask, out=x)                                                     
Out[361]: 
array([[100,   1,   2, 103],
       [  4,   5, 106,   7],
       [  8, 109,  10,  11]])

Fast numpy requires that we think in terms of the whole-array. The fast compiled code operates on arrays, or blocks of arrays. Python level iteration on arrays is slow, slower as you found out that iteration on lists. Accessing individual values of an array is more expensive.

For this small example, these whole-array methods are faster than the array iteration, though they are still slower than the list iteration. But the array methods scalar much better.

Upvotes: 2

Johnny
Johnny

Reputation: 694

emmmmm... It seems that list derivation is faster in the current case.But np faster when I add numba.

import dis
import time
import numpy as np
from numba import jit


my_array_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
my_array_np = np.array(my_array_list)

n = 1000000


# @jit
def fun1(my_array_np):
    # it is inplace option
    for a in range(n):
        my_array_np += 5


s_np = time.time()
fun1(my_array_np)
end_np = time.time() - s_np


def fuc2(my_array_list):
    for a in range(n):
        my_array_list = [[i + 5 for i in j] for j in my_array_list]
    return my_array_list


s_list = time.time()
my_array_list = fuc2(my_array_list)
end_list = time.time() - s_list

print('my_array_np:', '\n', my_array_np, '\n')
print('my_array_list:', '\n', my_array_list, '\n')

print('time to complete with numpy:', end_np)
print('time to complete with list:', end_list)

my_array_np: 
 [[500001 500002 500003]
 [500004 500005 500006]
 [500007 500008 500009]] 

my_array_list: 
 [[500001, 500002, 500003], [500004, 500005, 500006], [500007, 500008, 500009]] 


# use numba
time to complete with numpy: 0.27802205085754395
time to complete with list: 1.9161949157714844

# not use numba
time to complete with numpy: 3.4962515830993652
time to complete with list: 1.9761543273925781
[Finished in 3.4s]

Upvotes: 0

Related Questions