Reputation: 7339
I'm following a pandas tutorial that shows replacing values in columns by passing a dictionary to the series.map method. Here's a snippet from the tutorial:
However when I try this:
cols = star_wars.columns[3:9]
# Booleans for column values
answers = {
"Star Wars: Episode I The Phantom Menace":True,
"Star Wars: Episode II Attack of the Clones":True,
"Star Wars: Episode III Revenge of the Sith":True,
"Star Wars: Episode IV A New Hope":True,
"Star Wars: Episode V The Empire Strikes Back":True,
"Star Wars: Episode VI Return of the Jedi":True,
NaN:False
}
for c in cols:
star_wars[c] = star_wars[c].map(answers)
I get NameError: name 'NaN' is not defined
So what am I doing wrong?
edit: To explain my goal a little better, I have columns that look like this:
And I'm trying to replace the NaNs with False and the non-NaNs with True.
edit 2: Here's an image of the problem I'm still facing after changing NaN
to np.NaN
:
Then if I rerun the mapping cell and display the output again, all the False and NaN values flip-flop.
Upvotes: 1
Views: 2959
Reputation: 29710
Quite simply, Python doesn't have a built-in NaN
name. NumPy does, however, and so you could get your mapping to not thrown an error with np.nan
. There is also math.nan
which is equal to float('nan')
as Jon pointed out.
answers = {
"Star Wars: Episode I The Phantom Menace":True,
"Star Wars: Episode II Attack of the Clones":True,
"Star Wars: Episode III Revenge of the Sith":True,
"Star Wars: Episode IV A New Hope":True,
"Star Wars: Episode V The Empire Strikes Back":True,
"Star Wars: Episode VI Return of the Jedi":True,
np.nan:False
}
Don't stop here though, because that won't work.
The other tricky thing is that nan
doesn't technically equal anything so using it in a mapping like this won't be effective.
>>> np.nan == np.nan
False
Thus, the NaN values in your DataFrame won't be picked up by np.nan
as a key anyways, and remain NaN. For a further explanation of this, see NaNs as key in dictionaries. Furthermore, I would wager that your nan
values are actually the string nan
.
Minimal Demo
>>> df
0 1
0 Star Wars: Episode I The Phantom Menace nan
1 Star Wars: Episode IV A New Hope nan
2 Star Wars: Episode IV A New Hope Star Wars: Episode IV A New Hope
>>> for c in df.columns:
df[c] = df[c].map(answers)
>>> df
0 1
0 True NaN
1 True NaN
2 True True
# notice we're still stuck with NaN, as our nan strings weren't picked up
With that being said, this doesn't seem like a good use for a dict or map - you could just define the Star Wars strings in a set, then use isin
on your whole section of columns of interest.
answers = {
"Star Wars: Episode I The Phantom Menace",
"Star Wars: Episode II Attack of the Clones"
"Star Wars: Episode III Revenge of the Sith",
"Star Wars: Episode IV A New Hope",
"Star Wars: Episode V The Empire Strikes Back",
"Star Wars: Episode VI Return of the Jedi",
}
starwars.iloc[:, 3:9].isin(answers)
Minimal Demo
>>> answers = {
"Star Wars: Episode I The Phantom Menace",
"Star Wars: Episode II Attack of the Clones"
"Star Wars: Episode III Revenge of the Sith",
"Star Wars: Episode IV A New Hope",
"Star Wars: Episode V The Empire Strikes Back",
"Star Wars: Episode VI Return of the Jedi",
}
>>> df
0 1
0 Star Wars: Episode I The Phantom Menace nan
1 Star Wars: Episode IV A New Hope nan
2 Star Wars: Episode IV A New Hope Star Wars: Episode IV A New Hope
>>> df.isin(answers)
0 1
0 True False
1 True False
2 True True
Upvotes: 3
Reputation: 7339
So the problem I had with the other solution is that, because of how it works, the code will not operate in the same way after the first time it is ran. I'm working in a Jupyter notebook so I want something I can run multiple times. I'm only a Python beginner, but the following code seems to be able to run multiple times and only change the values the first time it is ran:
cols = star_wars.columns[3:9]
# Booleans for column values
answers = {
"Star Wars: Episode I The Phantom Menace":True,
"Star Wars: Episode II Attack of the Clones":True,
"Star Wars: Episode III Revenge of the Sith":True,
"Star Wars: Episode IV A New Hope":True,
"Star Wars: Episode V The Empire Strikes Back":True,
"Star Wars: Episode VI Return of the Jedi":True,
True:True,
False:False,
np.nan:False
}
for c in cols:
star_wars[c] = star_wars[c].map(answers)
Upvotes: -1