Peter Lynch
Peter Lynch

Reputation: 129

Can't find nan entries using numpy in array of strings

Can't find nan entries using numpy in array of strings my code is:

for x in X_cat:
    if x == np.nan:
        print('Found')

I know for a fact there are 2 nan entries inn the list but the code runs without printing anything. same if I replace np.nan with 'nan' My final objective is to replace the nan with the most common string.

Upvotes: 8

Views: 5003

Answers (6)

thomaskolasa
thomaskolasa

Reputation: 186

Not enough reputation to comment on Thibaut's answer, but to simplify it: The nan-string can be np.str_(np.nan) or even str(np.nan).

x = np.array(['hello', np.nan, 'world', np.nan], dtype=object)

x[np.where(x.astype(str)==str(np.nan))] = 'mostcommonstring'

Upvotes: 2

Thibaut Loiseleur
Thibaut Loiseleur

Reputation: 824

In an array of strings, you can only perform string comparisons. You have to initialize a nan in a string format.

nan_str = str_np.array([np.nan]).astype(str)[0]

And by initializing an array like you describe it :

x = np.array(['hello', np.nan, 'world', np.nan], dtype=object)

You can then replace these nan by the most common string that I assume to be mostcommonstring :

x[np.where(x.astype(str)==str_nan)]='mostcommonstring'

Upvotes: 5

BlackJack
BlackJack

Reputation: 4679

You simply cannot find np.nan in an array of strings because np.nan is a number, not a string and all elements within a numpy array must have the same type.

Upvotes: -1

Daniel F
Daniel F

Reputation: 14399

NaN is sometimes used by programmers as convenient "filler" that can act like a number and silently propagate. But mathematically, NaN represents expressions like 0/0 that can be essentially any number (if a = 0 / 0, a * 0 = 0 and thus a can be anything)

Excepting an infinitesimally small probability, "any possible number" == "any possible number" is False.

Equality is a whacky concept once you get into nan and inf values (just try wrapping your head around 1+2+3+4+5+... = -1/12). Just use the provided functions like np.isnan.

Upvotes: 0

Bathsheba
Bathsheba

Reputation: 234635

That's because comparing anything with NaN, including NaN, is False. So even when x is np.nan, the print will not run. (In fact that used to be an acceptable way of checking if something was NaN as no other IEEE754 floating point value has that property.)

Use np.isnan(x) to check if x is NaN.

Upvotes: 2

Oleh Rybalchenko
Oleh Rybalchenko

Reputation: 8019

You need to check x for NaN with np.isnan:

for x in X_cat:
    if np.isnan(x):
        print('Found')

np.nan == np.nan returns False, so direct comparison is meaningless here. Find more about isnan in numpy docs

Upvotes: 1

Related Questions