Reputation: 693
I have many of these matrix arrays, in which I want to replace the 0 entries with the closes non zero entry with the lowest index. This can be done easily using a for loop:
import numpy as np
input_array = np.array([ 0.01561, 0.01561, 0.02039, 0.02039, 0.02776, 0.02776,
0.03997, 0., 0.03997, 0.06243, 0., 0., 0.0624662,
0.11105, 0., 0., 0., 0.11105, 0.24986,
0., 0., 0., 0., 0., 0.,
0.24986])
for i in range(0,len(input_array)) :
if input_array[i] == 0 :
input_array[i] = input_array[i-1]
Would anyone suggest me if it is worth the effort?
Upvotes: 0
Views: 388
Reputation: 231738
Applying the numpy solution in:
Most efficient way to forward-fill NaN values in numpy array
def foo2(arr):
idx=np.where(arr==0,0,np.arange(len(arr)))
idx=np.maximum.accumulate(idx)
return arr[idx]
def foo1(arr):
arr = arr.copy()
for i in range(len(arr)):
if arr[i]==0:
arr[i] = arr[i-1]
return arr
For your test array, arr
, the speed improvement is modest:
In [67]: timeit foo1(arr)
100000 loops, best of 3: 18.1 µs per loop
In [68]: timeit foo2(arr)
The slowest run took 1387.12 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 11.4 µs per loop
But with a larger one, the loop increases with size, the array version barely changes:
In [69]: arr1=np.concatenate((arr,arr,arr,arr,arr,arr,arr))
In [70]: timeit foo1(arr1)
10000 loops, best of 3: 116 µs per loop
In [71]: timeit foo2(arr1)
The slowest run took 4.16 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 14.6 µs per loop
The details of the idx
construction:
In [72]: idx=np.arange(len(arr))
In [73]: idx[arr==0]=0
In [74]: idx
Out[74]:
array([ 0, 1, 2, 3, 4, 5, 6, 0, 8, 9, 0, 0, 12, 13, 0, 0, 0, 17, 18, 0, 0, 0, 0, 0, 0, 25])
In [75]: idx=np.maximum.accumulate(idx)
In [76]: idx
Out[76]:
array([ 0, 1, 2, 3, 4, 5, 6, 6, 8, 9, 9, 9, 12, 13, 13, 13, 13, 17, 18, 18, 18, 18, 18, 18, 18, 25], dtype=int32)
Upvotes: 1