Reputation: 13
I am trying to do the example in Use Python & Pandas to replace NaN in the 'size' column with a specific value, depending on the City. In the example below i am trying to assign a value of 18 if the City is St. Louis.
I have used the lambda function to do it since the original dataframe has lot of rows with the repeat of City names and only few of them have NaN values.
when i run the code I am getting an error - KeyError: ('size', 'occurred at index City')
below is the snippet of the code -
raw_data = {'City' : ['Dallas', 'Chicago', 'St Louis', 'SFO', 'St Louis'],
'size': [24, 36, 'NaN', 'NaN', 22],
'Type' : ['Pie', 'Hallo', 'Zombi', 'Dru', 'Zoro']
}
df = pd.DataFrame(raw_data)
df
df['size'] = df.apply(lambda x : x['size'].fillna(value = 18 if x['City' == 'St Louis'] else x['size'], axis = 1, inplace = True))
df
Expected - 18 to be populated in size column for St. Louis Actual - KeyError: ('size', 'occurred at index City')
Upvotes: 0
Views: 147
Reputation: 3582
There is a simple solution by fillna method
df['size'] = df['size'].fillna(18)
EDITED
What I failed to notice - that you populate cells with NaN string, not with real NaN values.
If you change your input data as
raw_data = {'City' : ['Dallas', 'Chicago', 'St Louis', 'SFO', 'St Louis'],
'size': [24, 36, np.NaN, np.NaN, 22],
'Type' : ['Pie', 'Hallo', 'Zombi', 'Dru', 'Zoro']
}
Then the following method will allow you to re-populate size columns cells by city names
df = pd.DataFrame(raw_data)
df[['City', 'size']] = df.set_index('City')['size'].fillna({'St Louis': 18, 'SFO': 20}).reset_index()
Upvotes: 0
Reputation: 471
If all you're trying to do is set the size of St. Louis, you can run:
df.loc[df['City'] == 'St Louis', 'size'] = 18
However, if you instead want to set all values of NaN
to 18
, you could likewise run:
df.loc[df['size'] == 'NaN', 'size'] = 18
And if you'd just like to set the size of all St. Louis entries where the size is NaN
, you could do:
df.loc[df['City'] == 'St Louis' and df['size'] == 'NaN', 'size'] = 18
Upvotes: 1