Reputation: 129
Can't find nan entries using numpy in array of strings my code is:
for x in X_cat:
if x == np.nan:
print('Found')
I know for a fact there are 2 nan entries inn the list but the code runs without printing anything. same if I replace np.nan with 'nan' My final objective is to replace the nan with the most common string.
Upvotes: 8
Views: 5003
Reputation: 186
Not enough reputation to comment on Thibaut's answer, but to simplify it:
The nan-string can be np.str_(np.nan)
or even str(np.nan)
.
x = np.array(['hello', np.nan, 'world', np.nan], dtype=object)
x[np.where(x.astype(str)==str(np.nan))] = 'mostcommonstring'
Upvotes: 2
Reputation: 824
In an array of strings, you can only perform string comparisons. You have to initialize a nan in a string format.
nan_str = str_np.array([np.nan]).astype(str)[0]
And by initializing an array like you describe it :
x = np.array(['hello', np.nan, 'world', np.nan], dtype=object)
You can then replace these nan
by the most common string that I assume to be mostcommonstring
:
x[np.where(x.astype(str)==str_nan)]='mostcommonstring'
Upvotes: 5
Reputation: 4679
You simply cannot find np.nan
in an array of strings because np.nan
is a number, not a string and all elements within a numpy array must have the same type.
Upvotes: -1
Reputation: 14399
NaN
is sometimes used by programmers as convenient "filler" that can act like a number and silently propagate. But mathematically, NaN
represents expressions like 0/0
that can be essentially any number (if a = 0 / 0
, a * 0 = 0
and thus a
can be anything)
Excepting an infinitesimally small probability, "any possible number" ==
"any possible number" is False
.
Equality is a whacky concept once you get into nan
and inf
values (just try wrapping your head around 1+2+3+4+5+... = -1/12
). Just use the provided functions like np.isnan
.
Upvotes: 0
Reputation: 234635
That's because comparing anything with NaN
, including NaN
, is False
. So even when x
is np.nan
, the print
will not run. (In fact that used to be an acceptable way of checking if something was NaN
as no other IEEE754 floating point value has that property.)
Use np.isnan(x)
to check if x
is NaN
.
Upvotes: 2
Reputation: 8019
You need to check x for NaN with np.isnan:
for x in X_cat:
if np.isnan(x):
print('Found')
np.nan == np.nan
returns False
, so direct comparison is meaningless here. Find more about isnan in numpy docs
Upvotes: 1