Hallgeir Wilhelmsen
Hallgeir Wilhelmsen

Reputation: 1134

Remove nans in multidimensional array

I have a multidimensional array. Example (in 2D):

x = np.array([[     1.,      1.,  np.nan,  np.nan],
              [     2.,  np.nan,      2.,  np.nan],
              [ np.nan,      3.,  np.nan,  np.nan]])

Is there an easy, efficient way to "compress" / "squeeze" / "push" the nans out of it, along an axis? I mean, so that the output (here: axis=0) would become:

np.array([[  1.,  1.,  np.nan,  np.nan],
          [  2.,  3.,      2.,  np.nan]])

Should also work with more than 2 dimensions.

Upvotes: 1

Views: 1650

Answers (1)

Paul Panzer
Paul Panzer

Reputation: 53029

You can use argsort on the mask of non-nan elements; use a stable sort algorithm (like mergesort) to preserve the original order of the non-nan elements:

mask = np.isnan(x)
cut = np.min(np.count_nonzero(mask, axis=0))
x[np.argsort(~mask, axis=0, kind='mergesort')[cut:], np.arange(x.shape[1])]

Output:

array([[  1.,   1.,  nan,  nan],
       [  2.,   3.,   2.,  nan]])

ND-version:

import numpy as np

def nan_bouncer(x, axis=0):
    if axis != 0:
        x = np.moveaxis(x, axis, 0)
    mask = np.isnan(x)
    cut = np.min(np.count_nonzero(mask, axis=0))
    idx = tuple(np.ogrid[tuple(map(slice, x.shape[1:]))])
    res = x[(np.argsort(~mask, axis=0, kind='mergesort')[cut:],) + idx] 
    return res if axis == 0 else np.moveaxis(res, 0, axis)

#demo
data = np.random.randint(0, 3, (3, 4, 4)).astype(float)
data /= data / data

print(data)
print(nan_bouncer(data))
print(nan_bouncer(data, 2))

Sample output:

[[[ nan   1.   2.   1.]
  [  2.  nan  nan   2.]
  [  2.   1.   1.   2.]
  [  1.   1.   2.  nan]]

 [[ nan  nan   2.   1.]
  [  2.   2.  nan   1.]
  [  2.   2.   2.   2.]
  [  2.   2.  nan   1.]]

 [[  1.   1.  nan  nan]
  [  1.   1.   2.   1.]
  [  2.  nan   2.   1.]
  [  1.   1.   1.   2.]]]


[[[ nan  nan  nan  nan]
  [  2.  nan  nan   2.]
  [  2.  nan   1.   2.]
  [  1.   1.  nan  nan]]

 [[ nan   1.   2.   1.]
  [  2.   2.  nan   1.]
  [  2.   1.   2.   2.]
  [  2.   2.   2.   1.]]

 [[  1.   1.   2.   1.]
  [  1.   1.   2.   1.]
  [  2.   2.   2.   1.]
  [  1.   1.   1.   2.]]]


[[[ nan   1.   2.   1.]
  [ nan  nan   2.   2.]
  [  2.   1.   1.   2.]
  [ nan   1.   1.   2.]]

 [[ nan  nan   2.   1.]
  [ nan   2.   2.   1.]
  [  2.   2.   2.   2.]
  [ nan   2.   2.   1.]]

 [[ nan  nan   1.   1.]
  [  1.   1.   2.   1.]
  [ nan   2.   2.   1.]
  [  1.   1.   1.   2.]]]

Upvotes: 4

Related Questions