Austin
Austin

Reputation: 7339

Replacing NaN with pandas series.map(dict)

I'm following a pandas tutorial that shows replacing values in columns by passing a dictionary to the series.map method. Here's a snippet from the tutorial: enter image description here

However when I try this:

cols = star_wars.columns[3:9]

# Booleans for column values
answers = {
        "Star Wars: Episode I  The Phantom Menace":True, 
        "Star Wars: Episode II  Attack of the Clones":True, 
        "Star Wars: Episode III  Revenge of the Sith":True,
        "Star Wars: Episode IV  A New Hope":True,
        "Star Wars: Episode V  The Empire Strikes Back":True,
        "Star Wars: Episode VI  Return of the Jedi":True,
        NaN:False
        }

for c in cols:
    star_wars[c] = star_wars[c].map(answers) 

I get NameError: name 'NaN' is not defined

So what am I doing wrong?

edit: To explain my goal a little better, I have columns that look like this: enter image description here

And I'm trying to replace the NaNs with False and the non-NaNs with True.

edit 2: Here's an image of the problem I'm still facing after changing NaN to np.NaN:

enter image description here

Then if I rerun the mapping cell and display the output again, all the False and NaN values flip-flop.

Upvotes: 1

Views: 2959

Answers (2)

miradulo
miradulo

Reputation: 29710

Quite simply, Python doesn't have a built-in NaN name. NumPy does, however, and so you could get your mapping to not thrown an error with np.nan. There is also math.nan which is equal to float('nan') as Jon pointed out.

answers = {
        "Star Wars: Episode I  The Phantom Menace":True, 
        "Star Wars: Episode II  Attack of the Clones":True, 
        "Star Wars: Episode III  Revenge of the Sith":True,
        "Star Wars: Episode IV  A New Hope":True,
        "Star Wars: Episode V  The Empire Strikes Back":True,
        "Star Wars: Episode VI  Return of the Jedi":True,
        np.nan:False
        }

Don't stop here though, because that won't work. The other tricky thing is that nan doesn't technically equal anything so using it in a mapping like this won't be effective.

>>> np.nan == np.nan 
False

Thus, the NaN values in your DataFrame won't be picked up by np.nan as a key anyways, and remain NaN. For a further explanation of this, see NaNs as key in dictionaries. Furthermore, I would wager that your nan values are actually the string nan.

Minimal Demo

>>> df
                                          0                                  1
0  Star Wars: Episode I  The Phantom Menace                                nan
1         Star Wars: Episode IV  A New Hope                                nan
2         Star Wars: Episode IV  A New Hope  Star Wars: Episode IV  A New Hope

>>> for c in df.columns:
        df[c] = df[c].map(answers)


>>> df
      0     1
0  True   NaN
1  True   NaN
2  True  True

# notice we're still stuck with NaN, as our nan strings weren't picked up

Better solution

With that being said, this doesn't seem like a good use for a dict or map - you could just define the Star Wars strings in a set, then use isin on your whole section of columns of interest.

answers = {
        "Star Wars: Episode I  The Phantom Menace",
        "Star Wars: Episode II  Attack of the Clones" 
        "Star Wars: Episode III  Revenge of the Sith",
        "Star Wars: Episode IV  A New Hope",
        "Star Wars: Episode V  The Empire Strikes Back",
        "Star Wars: Episode VI  Return of the Jedi",
        }

starwars.iloc[:, 3:9].isin(answers) 

Minimal Demo

>>> answers = {
            "Star Wars: Episode I  The Phantom Menace",
            "Star Wars: Episode II  Attack of the Clones" 
            "Star Wars: Episode III  Revenge of the Sith",
            "Star Wars: Episode IV  A New Hope",
            "Star Wars: Episode V  The Empire Strikes Back",
            "Star Wars: Episode VI  Return of the Jedi",
            }

>>> df
                                          0                                  1
0  Star Wars: Episode I  The Phantom Menace                                nan
1         Star Wars: Episode IV  A New Hope                                nan
2         Star Wars: Episode IV  A New Hope  Star Wars: Episode IV  A New Hope

>>> df.isin(answers)

      0      1
0  True  False
1  True  False
2  True   True

Upvotes: 3

Austin
Austin

Reputation: 7339

So the problem I had with the other solution is that, because of how it works, the code will not operate in the same way after the first time it is ran. I'm working in a Jupyter notebook so I want something I can run multiple times. I'm only a Python beginner, but the following code seems to be able to run multiple times and only change the values the first time it is ran:

cols = star_wars.columns[3:9]

# Booleans for column values
answers = {
        "Star Wars: Episode I  The Phantom Menace":True,
        "Star Wars: Episode II  Attack of the Clones":True, 
        "Star Wars: Episode III  Revenge of the Sith":True,
        "Star Wars: Episode IV  A New Hope":True,
        "Star Wars: Episode V The Empire Strikes Back":True,
        "Star Wars: Episode VI Return of the Jedi":True,
        True:True,
        False:False,
        np.nan:False
        }

for c in cols:
    star_wars[c] = star_wars[c].map(answers)    

Upvotes: -1

Related Questions