Reputation: 23
I have an array that contains NaN values or zeros as shown below. I would like to go through the array and replace every 0 with an integer, in an increasing sequence. I.e., the first zero becomes "1", the next zero becomes "2", then "3", etc.
Input:
arrayOfZeros =
array([[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[ 0., nan, nan, nan, nan],
[ 0., nan, 0., nan, 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[nan, 0., 0., 0., 0.],
[nan, 0., nan, nan, nan],
[nan, nan, 0., nan, nan],
[ 0., nan, 0., nan, 0.],
[ 0., nan, 0., nan, 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[nan, nan, 0., 0., 0.],
[nan, nan, nan, nan, 0.],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan]])
The desired output:
[nan, nan, nan, nan, nan],
[ 1., nan, nan, nan, nan],
[ 2., nan, 19., nan, 39.],
[ 3., 11., 20., 31., 40.],
[ 4., 12., 21., 32., 41.],
[nan, 13., 22., 33., 42.],
[nan, 14., nan, nan, nan],
[nan, nan, 23., nan, nan],
[ 5., nan, 24., nan, 43.],
[ 6., nan, 25., nan, 44.],
[ 7., 15., 26., 34., 45.],
[ 8., 16., 27., 35., 46.],
[ 9., 17., 28., 36., 47.],
[10., 18., 29., 37., 48.],
[nan, nan, 30., 38., 49.],
[nan, nan, nan, nan, 50.],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan]])
Currently, I can almost do exactly what I want with the following code:
with np.nditer(arrayOfZeros, op_flags=['readwrite']) as y:
preference = 1
for x in y:
if x == 0:
x[...] = preference
preference += 1
However, if I run this code outside of the Python Console, I get the following error message:
TypeError: Iterator operand or requested dtype holds references, but the REFS_OK flag was not enabled
Is there another way to accomplish this in NumPy?
Upvotes: 2
Views: 359
Reputation: 231335
Why did you use nditer
? Basically you got it working, which wasn't a trivial task. But somehow missed the message that it isn't a speed tool, at least not when used in Python code. Plain iteration is usually just as good, unless you are doing some fancy broadcasting. But as the other answers show, a non-iterative approach is even better.
But let's focus on nditer
:
https://numpy.org/devdocs/reference/arrays.nditer.html
Recreate your array:
In [1]: nan=np.nan
In [2]: arr = np.array([[nan, nan, nan, nan, nan],
...: [nan, nan, nan, nan, nan],
...: [ 0., nan, nan, nan, nan],
...: [ 0., nan, 0., nan, 0.],
...: [ 0., 0., 0., 0., 0.],
...: [ 0., 0., 0., 0., 0.],
...: [nan, 0., 0., 0., 0.],
...: [nan, 0., nan, nan, nan],
...
In [3]: arrayOfZeros = arr.copy()
In [4]: arr.dtype
Out[4]: dtype('float64')
In [5]: with np.nditer(arrayOfZeros, op_flags=['readwrite']) as y:
...: preference = 1
...: for x in y:
...: if x == 0:
...: x[...] = preference
...: preference += 1
...:
In [6]: arrayOfZeros
Out[6]:
array([[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[ 1., nan, nan, nan, nan],
[ 2., nan, 3., nan, 4.],
[ 5., 6., 7., 8., 9.],
[10., 11., 12., 13., 14.],
[nan, 15., 16., 17., 18.],
[nan, 19., nan, nan, nan],
...
OK it works - but the layout of consecutive numbers doesn't match your display. Your display is forcing all the other answers to do contortions with transpose.
If I change the dtype of the array to object
I get your error:
In [7]: arrayOfZeros = arr.astype(object)
In [8]: with np.nditer(arrayOfZeros, op_flags=['readwrite']) as y:
...: preference = 1
...: for x in y:
...: if x == 0:
...: x[...] = preference
...: preference += 1
...:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-7dd225a24a36> in <module>
----> 1 with np.nditer(arrayOfZeros, op_flags=['readwrite']) as y:
2 preference = 1
3 for x in y:
4 if x == 0:
5 x[...] = preference
TypeError: Iterator operand or requested dtype holds references, but the REFS_OK flag was not enabled
Making the suggest fix: https://docs.scipy.org/doc/numpy/reference/generated/numpy.nditer.html
In [10]: with np.nditer(arrayOfZeros, flags=['refs_ok'], op_flags=['readwrite']) as y:
...: preference = 1
...: for x in y:
...: if x == 0:
...: x[...] = preference
...: preference += 1
...:
In [11]: arrayOfZeros
Out[11]:
array([[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[1, nan, nan, nan, nan],
[2, nan, 3, nan, 4],
[5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[nan, 15, 16, 17, 18],
[nan, 19, nan, nan, nan],
It doesn't display in neat columns because of the object dtype.
If I change the array to order='F'
, we get the consecutive numbers going down the columns:
In [12]: arrayOfZeros = arr.copy(order='F')
In [14]: with np.nditer(arrayOfZeros, op_flags=['readwrite']) as y:
...:
In [15]: arrayOfZeros
Out[15]:
array([[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[ 1., nan, nan, nan, nan],
[ 2., nan, 19., nan, 39.],
[ 3., 11., 20., 31., 40.],
[ 4., 12., 21., 32., 41.],
[nan, 13., 22., 33., 42.],
[nan, 14., nan, nan, nan],
....
The order 'Fand the object dtype makes me wonder - is the source of this array a
pandas` Dataframe?
Upvotes: 0
Reputation: 53029
Why is everybody insisting on using the cumsum
here? It's wasteful. Better:
out = arrayOfZeros.copy()
z = out==out
out.T[z.T] = np.arange(1,1+np.count_nonzero(z))
Timings:
5.025142431259155 # PP
38.67108239792287 # cumsum 1 rafaelc
9.263199986889958 # cumsum 2 Derek Eden
9.044178808107972 # cumsum 3 Onyambu
10.640528565272689 # cumsum 4 Andy L.
Code:
import numpy as np
array,nan = np.array,np.nan
x = \
array([[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[ 0., nan, nan, nan, nan],
[ 0., nan, 0., nan, 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[nan, 0., 0., 0., 0.],
[nan, 0., nan, nan, nan],
[nan, nan, 0., nan, nan],
[ 0., nan, 0., nan, 0.],
[ 0., nan, 0., nan, 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[nan, nan, 0., 0., 0.],
[nan, nan, nan, nan, 0.],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan]])
from timeit import timeit
def f_pp():
out = x.copy()
z = out==out
out.T[z.T] = np.arange(1,1+np.count_nonzero(z))
return out
def f_cumsum():
arr = x.copy()
mask = ~np.isnan(arr)
arr[mask] = np.nan_to_num(arr + 1).ravel('F').cumsum().reshape(arr.shape, order='F')[mask]
return arr
def f_cumsum_2():
arr = x.copy()
in_arr = arr.T
fill = (in_arr==0).cumsum().reshape(in_arr.shape)
return (in_arr + fill).T
def f_cumsum_3():
arrayOfZeros = x.copy()
mask = arrayOfZeros==0
arrayOfZeros.T[mask.T] = mask.T.cumsum()[mask.T.flatten()]
return arrayOfZeros
def f_cumsum_4():
arrayOfZeros = x.copy()
m = (arrayOfZeros == 0)
a = (arrayOfZeros.T == 0).cumsum().reshape(-1, arrayOfZeros.shape[0]).T
arrayOfZeros[m] = a[m]
return arrayOfZeros
assert(np.nan_to_num(f_pp()) == np.nan_to_num(f_cumsum())).all()
assert(np.nan_to_num(f_pp()) == np.nan_to_num(f_cumsum_2())).all()
assert(np.nan_to_num(f_pp()) == np.nan_to_num(f_cumsum_3())).all()
assert(np.nan_to_num(f_pp()) == np.nan_to_num(f_cumsum_4())).all()
for f in (f_pp,f_cumsum,f_cumsum_2,f_cumsum_3,f_cumsum_4):
print(timeit(f,number=10000)*100)
Upvotes: 2
Reputation: 25239
create True
mask m
on 0
. Use transpose
, cumsum
, reshape
to create array of increment of 0
. Finally, assign through mask m
m = (arrayOfZeros == 0)
a = (arrayOfZeros.T == 0).cumsum().reshape(-1, arrayOfZeros.shape[0]).T
arrayOfZeros[m] = a[m]
Out[353]:
array([[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[ 1., nan, nan, nan, nan],
[ 2., nan, 19., nan, 39.],
[ 3., 11., 20., 31., 40.],
[ 4., 12., 21., 32., 41.],
[nan, 13., 22., 33., 42.],
[nan, 14., nan, nan, nan],
[nan, nan, 23., nan, nan],
[ 5., nan, 24., nan, 43.],
[ 6., nan, 25., nan, 44.],
[ 7., 15., 26., 34., 45.],
[ 8., 16., 27., 35., 46.],
[ 9., 17., 28., 36., 47.],
[10., 18., 29., 37., 48.],
[nan, nan, 30., 38., 49.],
[nan, nan, nan, nan, 50.],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan]])
Upvotes: 0
Reputation: 79188
mask = arrayOfZeros==0
arrayOfZeros.T[mask.T] = mask.T.cumsum()[mask.T.flatten()]
array([[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[ 1., nan, nan, nan, nan],
[ 2., nan, 19., nan, 39.],
[ 3., 11., 20., 31., 40.],
[ 4., 12., 21., 32., 41.],
[nan, 13., 22., 33., 42.],
[nan, 14., nan, nan, nan],
[nan, nan, 23., nan, nan],.....
Upvotes: 0
Reputation: 4618
could also do this approach:
arr #just for example
array([[ 0., nan, 0., nan, nan, 0., 0.],
[ 0., 0., 0., nan, nan, nan, 0.]])
in_arr = arr.T
fill = (in_arr==0).cumsum().reshape(in_arr.shape)
out_arr = (in_arr + fill).T
output:
array([[ 1., nan, 4., nan, nan, 6., 7.],
[ 2., 3., 5., nan, nan, nan, 8.]])
Upvotes: 0
Reputation: 59264
Use broadcasting. Save the mask with isnan
, and ravel()
with 'F'
ordering + cumsum
for vectorized summation.
mask = ~np.isnan(arr)
arr[mask] = np.nan_to_num(arr + 1).ravel('F').cumsum().reshape(a.shape, order='F')[mask]
Since you tagged pandas
, if you have a df
you may cumsum
directly since it skips nan.
pd.DataFrame(arr.ravel('F')).add(1).cumsum().to_numpy().reshape(a.shape, order='F')
Upvotes: 2