Reputation: 39
I'm working with a 2d array. Basically just trying to do an element wise addition of a constant value. Need to speed code up so attempted to use numpy array instead of list of list but finding numpy to be slower. Any idea of what I'm doing wrong? Thanks.
For example:
import time
import numpy as np
my_array_list = [[1,2,3],[4,5,6],[7,8,9]]
my_array_np = np.array(my_array_list)
n = 100000
s_np = time.time()
for a in range(n):
for i in range(3):
for j in range(3):
my_array_np[i,j] = my_array_np[i,j] + 5
end_np = time.time() - s_np
s_list = time.time()
for a in range(n):
for i in range(3):
for j in range(3):
my_array_list[i][j] = my_array_list[i][j] + 5
end_list = time.time() - s_list
print('my_array_np:', '\n', my_array_np, '\n')
print('my_array_list:', '\n',my_array_list, '\n')
print('time to complete with numpy:', end_np)
print('time to complete with list:', end_list)
Output:
my_array_np:
[[500001 500002 500003]
[500004 500005 500006]
[500007 500008 500009]]
my_array_list:
[[500001, 500002, 500003], [500004, 500005, 500006], [500007, 500008, 500009]]
time to complete with numpy: 0.7831366062164307
time to complete with list: 0.45527076721191406
Can see with this test using lists, the time to complete is significantly faster, ie, 0.45 vs 0.78 seconds. Should not numpy be significantly faster here?
Upvotes: 2
Views: 1995
Reputation: 231325
Let's say you want to add something to all elements that are multiples of 3. Instead of iterating on all elements of the array, we would normally use a mask
In [355]: x = np.arange(12).reshape(3,4)
In [356]: mask = (x%3)==0
In [357]: mask
Out[357]:
array([[ True, False, False, True],
[False, False, True, False],
[False, True, False, False]])
In [358]: x[mask] += 100
In [359]: x
Out[359]:
array([[100, 1, 2, 103],
[ 4, 5, 106, 7],
[ 8, 109, 10, 11]])
Many operations are ufunc
, which have a where
parameter
In [360]: x = np.arange(12).reshape(3,4)
In [361]: np.add(x,100, where=mask, out=x)
Out[361]:
array([[100, 1, 2, 103],
[ 4, 5, 106, 7],
[ 8, 109, 10, 11]])
Fast numpy
requires that we think in terms of the whole-array. The fast compiled code operates on arrays, or blocks of arrays. Python level iteration on arrays is slow, slower as you found out that iteration on lists. Accessing individual values of an array is more expensive.
For this small example, these whole-array methods are faster than the array iteration, though they are still slower than the list iteration. But the array methods scalar much better.
Upvotes: 2
Reputation: 694
emmmmm... It seems that list derivation is faster in the current case.But np faster when I add numba.
import dis
import time
import numpy as np
from numba import jit
my_array_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
my_array_np = np.array(my_array_list)
n = 1000000
# @jit
def fun1(my_array_np):
# it is inplace option
for a in range(n):
my_array_np += 5
s_np = time.time()
fun1(my_array_np)
end_np = time.time() - s_np
def fuc2(my_array_list):
for a in range(n):
my_array_list = [[i + 5 for i in j] for j in my_array_list]
return my_array_list
s_list = time.time()
my_array_list = fuc2(my_array_list)
end_list = time.time() - s_list
print('my_array_np:', '\n', my_array_np, '\n')
print('my_array_list:', '\n', my_array_list, '\n')
print('time to complete with numpy:', end_np)
print('time to complete with list:', end_list)
my_array_np:
[[500001 500002 500003]
[500004 500005 500006]
[500007 500008 500009]]
my_array_list:
[[500001, 500002, 500003], [500004, 500005, 500006], [500007, 500008, 500009]]
# use numba
time to complete with numpy: 0.27802205085754395
time to complete with list: 1.9161949157714844
# not use numba
time to complete with numpy: 3.4962515830993652
time to complete with list: 1.9761543273925781
[Finished in 3.4s]
Upvotes: 0