Dax Feliz
Dax Feliz

Reputation: 12899

How do I remove NaN values from a NumPy array?

How do I remove NaN values from a NumPy array?

[1, 2, NaN, 4, NaN, 8]   ⟶   [1, 2, 4, 8]

Upvotes: 384

Views: 853848

Answers (13)

jmetz
jmetz

Reputation: 12773

To remove NaN values from a NumPy array x:

x = x[~numpy.isnan(x)]
Explanation

The inner function numpy.isnan returns a boolean/logical array which has the value True everywhere that x is not-a-number. Since we want the opposite, we use the logical-not operator ~ to get an array with Trues everywhere that x is a valid number.

Lastly, we use this logical array to index into the original array x, in order to retrieve just the non-NaN values.

Upvotes: 615

Darren Weber
Darren Weber

Reputation: 1675

pandas introduces an option to convert all data types to missing values.

The np.isnan() function is not compatible with all data types, e.g.

>>> import numpy as np
>>> values = [np.nan, "x", "y"]
>>> np.isnan(values)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

The pd.isna() and pd.notna() functions are compatible with many data types and pandas introduces a pd.NA value:

>>> import numpy as np
>>> import pandas as pd

>>> values = pd.Series([np.nan, "x", "y"])
>>> values
0    NaN
1      x
2      y
dtype: object
>>> values.loc[pd.isna(values)]
0    NaN
dtype: object
>>> values.loc[pd.isna(values)] = pd.NA
>>> values.loc[pd.isna(values)]
0    <NA>
dtype: object
>>> values
0    <NA>
1       x
2       y
dtype: object

#
# using map with lambda, or a list comprehension
#

>>> values = [np.nan, "x", "y"]
>>> list(map(lambda x: pd.NA if pd.isna(x) else x, values))
[<NA>, 'x', 'y']
>>> [pd.NA if pd.isna(x) else x for x in values]
[<NA>, 'x', 'y']

Upvotes: 1

Robin Teuwens
Robin Teuwens

Reputation: 151

In case it helps, for simple 1d arrays:

x = np.array([np.nan, 1, 2, 3, 4])

x[~np.isnan(x)]
>>> array([1., 2., 3., 4.])

but if you wish to expand to matrices and preserve the shape:

x = np.array([
    [np.nan, np.nan],
    [np.nan, 0],
    [1, 2],
    [3, 4]
])

x[~np.isnan(x).any(axis=1)]
>>> array([[1., 2.],
           [3., 4.]])

I encountered this issue when dealing with pandas .shift() functionality, and I wanted to avoid using .apply(..., axis=1) at all cost due to its inefficiency.

Upvotes: 7

bitbang
bitbang

Reputation: 2182

Simply fill with

 x = numpy.array([
 [0.99929941, 0.84724713, -0.1500044],
 [-0.79709026, numpy.NaN, -0.4406645],
 [-0.3599013, -0.63565744, -0.70251352]])

x[numpy.isnan(x)] = .555

print(x)

# [[ 0.99929941  0.84724713 -0.1500044 ]
#  [-0.79709026  0.555      -0.4406645 ]
#  [-0.3599013  -0.63565744 -0.70251352]]

Upvotes: 2

M4urice
M4urice

Reputation: 641

@jmetz's answer is probably the one most people need; however it yields a one-dimensional array, e.g. making it unusable to remove entire rows or columns in matrices.

To do so, one should reduce the logical array to one dimension, then index the target array. For instance, the following will remove rows which have at least one NaN value:

x = x[~numpy.isnan(x).any(axis=1)]

See more detail here.

Upvotes: 26

koliyat9811
koliyat9811

Reputation: 917

As shown by others

x[~numpy.isnan(x)]

works. But it will throw an error if the numpy dtype is not a native data type, for example if it is object. In that case you can use pandas.

x[~pandas.isna(x)] or x[~pandas.isnull(x)]

Upvotes: 10

Markus Dutschke
Markus Dutschke

Reputation: 10596

The accepted answer changes shape for 2d arrays. I present a solution here, using the Pandas dropna() functionality. It works for 1D and 2D arrays. In the 2D case you can choose weather to drop the row or column containing np.nan.

import pandas as pd
import numpy as np

def dropna(arr, *args, **kwarg):
    assert isinstance(arr, np.ndarray)
    dropped=pd.DataFrame(arr).dropna(*args, **kwarg).values
    if arr.ndim==1:
        dropped=dropped.flatten()
    return dropped

x = np.array([1400, 1500, 1600, np.nan, np.nan, np.nan ,1700])
y = np.array([[1400, 1500, 1600], [np.nan, 0, np.nan] ,[1700,1800,np.nan]] )


print('='*20+' 1D Case: ' +'='*20+'\nInput:\n',x,sep='')
print('\ndropna:\n',dropna(x),sep='')

print('\n\n'+'='*20+' 2D Case: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna (rows):\n',dropna(y),sep='')
print('\ndropna (columns):\n',dropna(y,axis=1),sep='')

print('\n\n'+'='*20+' x[np.logical_not(np.isnan(x))] for 2D: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna:\n',x[np.logical_not(np.isnan(x))],sep='')

Result:

==================== 1D Case: ====================
Input:
[1400. 1500. 1600.   nan   nan   nan 1700.]

dropna:
[1400. 1500. 1600. 1700.]


==================== 2D Case: ====================
Input:
[[1400. 1500. 1600.]
 [  nan    0.   nan]
 [1700. 1800.   nan]]

dropna (rows):
[[1400. 1500. 1600.]]

dropna (columns):
[[1500.]
 [   0.]
 [1800.]]


==================== x[np.logical_not(np.isnan(x))] for 2D: ====================
Input:
[[1400. 1500. 1600.]
 [  nan    0.   nan]
 [1700. 1800.   nan]]

dropna:
[1400. 1500. 1600. 1700.]

Upvotes: 7

aloha
aloha

Reputation: 4774

If you're using numpy

# first get the indices where the values are finite
ii = np.isfinite(x)

# second get the values
x = x[ii]

Upvotes: 8

A simplest way is:

numpy.nan_to_num(x)

Documentation: https://docs.scipy.org/doc/numpy/reference/generated/numpy.nan_to_num.html

Upvotes: -2

Daniel Kislyuk
Daniel Kislyuk

Reputation: 986

For me the answer by @jmetz didn't work, however using pandas isnull() did.

x = x[~pd.isnull(x)]

Upvotes: 46

melissaOu
melissaOu

Reputation: 61

Doing the above :

x = x[~numpy.isnan(x)]

or

x = x[numpy.logical_not(numpy.isnan(x))]

I found that resetting to the same variable (x) did not remove the actual nan values and had to use a different variable. Setting it to a different variable removed the nans. e.g.

y = x[~numpy.isnan(x)]

Upvotes: 6

udibr
udibr

Reputation: 1089

filter(lambda v: v==v, x)

works both for lists and numpy array since v!=v only for NaN

Upvotes: 77

liori
liori

Reputation: 42227

Try this:

import math
print [value for value in x if not math.isnan(value)]

For more, read on List Comprehensions.

Upvotes: 36

Related Questions