Replacing NaN with pandas series.map(dict)

Question

I'm following a pandas tutorial that shows replacing values in columns by passing a dictionary to the series.map method. Here's a snippet from the tutorial:

However when I try this:

cols = star_wars.columns[3:9]

# Booleans for column values
answers = {
        "Star Wars: Episode I  The Phantom Menace":True, 
        "Star Wars: Episode II  Attack of the Clones":True, 
        "Star Wars: Episode III  Revenge of the Sith":True,
        "Star Wars: Episode IV  A New Hope":True,
        "Star Wars: Episode V  The Empire Strikes Back":True,
        "Star Wars: Episode VI  Return of the Jedi":True,
        NaN:False
        }

for c in cols:
    star_wars[c] = star_wars[c].map(answers)

I get NameError: name 'NaN' is not defined

So what am I doing wrong?

edit: To explain my goal a little better, I have columns that look like this:

And I'm trying to replace the NaNs with False and the non-NaNs with True.

edit 2: Here's an image of the problem I'm still facing after changing NaN to np.NaN:

Then if I rerun the mapping cell and display the output again, all the False and NaN values flip-flop.

miradulo · Accepted Answer

Quite simply, Python doesn't have a built-in NaN name. NumPy does, however, and so you could get your mapping to not thrown an error with np.nan. There is also math.nan which is equal to float('nan') as Jon pointed out.

answers = {
        "Star Wars: Episode I  The Phantom Menace":True, 
        "Star Wars: Episode II  Attack of the Clones":True, 
        "Star Wars: Episode III  Revenge of the Sith":True,
        "Star Wars: Episode IV  A New Hope":True,
        "Star Wars: Episode V  The Empire Strikes Back":True,
        "Star Wars: Episode VI  Return of the Jedi":True,
        np.nan:False
        }

Don't stop here though, because that won't work. The other tricky thing is that nan doesn't technically equal anything so using it in a mapping like this won't be effective.

>>> np.nan == np.nan 
False

Thus, the NaN values in your DataFrame won't be picked up by np.nan as a key anyways, and remain NaN. For a further explanation of this, see NaNs as key in dictionaries. Furthermore, I would wager that your nan values are actually the string nan.

Minimal Demo

>>> df
                                          0                                  1
0  Star Wars: Episode I  The Phantom Menace                                nan
1         Star Wars: Episode IV  A New Hope                                nan
2         Star Wars: Episode IV  A New Hope  Star Wars: Episode IV  A New Hope

>>> for c in df.columns:
        df[c] = df[c].map(answers)


>>> df
      0     1
0  True   NaN
1  True   NaN
2  True  True

# notice we're still stuck with NaN, as our nan strings weren't picked up

Better solution

With that being said, this doesn't seem like a good use for a dict or map - you could just define the Star Wars strings in a set, then use isin on your whole section of columns of interest.

answers = {
        "Star Wars: Episode I  The Phantom Menace",
        "Star Wars: Episode II  Attack of the Clones" 
        "Star Wars: Episode III  Revenge of the Sith",
        "Star Wars: Episode IV  A New Hope",
        "Star Wars: Episode V  The Empire Strikes Back",
        "Star Wars: Episode VI  Return of the Jedi",
        }

starwars.iloc[:, 3:9].isin(answers)

Minimal Demo

>>> answers = {
            "Star Wars: Episode I  The Phantom Menace",
            "Star Wars: Episode II  Attack of the Clones" 
            "Star Wars: Episode III  Revenge of the Sith",
            "Star Wars: Episode IV  A New Hope",
            "Star Wars: Episode V  The Empire Strikes Back",
            "Star Wars: Episode VI  Return of the Jedi",
            }

>>> df
                                          0                                  1
0  Star Wars: Episode I  The Phantom Menace                                nan
1         Star Wars: Episode IV  A New Hope                                nan
2         Star Wars: Episode IV  A New Hope  Star Wars: Episode IV  A New Hope

>>> df.isin(answers)

      0      1
0  True  False
1  True  False
2  True   True

Replacing NaN with pandas series.map(dict)

Answers (2)

Better solution

Related Questions