Reputation: 6531
I read in a dataset as a numpy.ndarray
and some of the values are missing (either by just not being there, being NaN
, or by being a string written "NA
").
I want to clean out all rows containing any entry like this. How do I do that with a numpy ndarray?
Upvotes: 122
Views: 92764
Reputation: 23321
You can also use a masked array via np.ma.fix_invalid
to create a mask and filter out "bad" values (such as NaN, inf).
arr = np.array([
[0, 1, np.inf],
[2.2, 3.3, 4.],
[np.nan, 5.5, 6],
[7.8, -np.inf, 9.9],
[10, 11, 12]
])
new_arr = arr[~np.ma.fix_invalid(arr).mask.any(axis=1)]
# array([[ 2.2, 3.3, 4. ],
# [10. , 11. , 12. ]])
If the array contains strings such as 'NA'
, then np.where
may be useful to "mask" these values and then filter them out.
arr = np.array([
[0, 1, 'N/A'],
[2.2, 3.3, 4.],
[np.nan, 5.5, 6],
[7.8, 'NA', 9.9],
[10, 11, 12]
], dtype=object)
tmp = np.where(np.isin(arr, ['NA', 'N/A']), np.nan, arr).astype(float)
new_arr = tmp[~np.isnan(tmp).any(axis=1)]
# array([[ 2.2, 3.3, 4. ],
# [10. , 11. , 12. ]])
Upvotes: 1
Reputation: 213025
>>> a = np.array([[1,2,3], [4,5,np.nan], [7,8,9]])
array([[ 1., 2., 3.],
[ 4., 5., nan],
[ 7., 8., 9.]])
>>> a[~np.isnan(a).any(axis=1)]
array([[ 1., 2., 3.],
[ 7., 8., 9.]])
and reassign this to a
.
Explanation: np.isnan(a)
returns a similar array with True
where NaN
, False
elsewhere. .any(axis=1)
reduces an m*n
array to n
with an logical or
operation on the whole rows, ~
inverts True/False
and a[ ]
chooses just the rows from the original array, which have True
within the brackets.
Upvotes: 198