Reputation: 53
I want to fill missing values of categorical values in Pandas data frame with the most frequent values on another category. For example,
import pandas as pd
import numpy as np
data = {'type': ['softdrink', 'juice', 'softdrink', 'softdrink', 'juice','juice','juice'],
'product': ['coca', np.nan, 'pepsi', 'pepsi', 'orange','grape',np.nan],
'price': [25, 94, 57, 62, 70,50,60]}
df = pd.DataFrame(data)
df
which results in
price | product | type
0 25 | coca | softdrink
1 94 | NaN | juice
2 57 | pepsi | softdrink
3 62 | pepsi | softdrink
4 70 | orange | juice
5 50 | grape | juice
6 60 | NaN | softdrink
First, I use groupby as
df.groupby('type')['product'].value_counts()
to get
type | product
juice | grape | 1
| orange | 1
softdrink | pepsi | 2
| coca | 1
Name: product, dtype: int64
I want to fill a missing product of second row with "pepsi" (the most infrequence) but filling "grape" for missing value of row 6 of category "juice". Without categorical group, my solution is to find most frequent value by the column and assign this value to missing value.
df['product'].fillna(df['product'].value_counts().index[0],inplace=True)
I struggle to complete the task since the return value of the command
df.groupby('type')['product'].value_counts()
is pandas series which can be accessed by
df.groupby('type')['product'].value_counts()['softdrink']['pepsi']
how I know which product+category has the most frequence.
Upvotes: 3
Views: 7487
Reputation: 323226
IIUC
Using mode
Data input
import pandas as pd
import numpy as np
data = {'type': ['softdrink', 'juice', 'softdrink', 'softdrink', 'juice','juice','softdrink'],
'product': ['coca', np.nan, 'pepsi', 'pepsi', 'orange','grape',np.nan],
'price': [25, 94, 57, 62, 70,50,60]}
df = pd.DataFrame(data)
solution
df.groupby('type').product.transform(lambda x: x.fillna(x.mode()[0]))
Out[28]:
0 coca
1 grape
2 pepsi
3 pepsi
4 orange
5 grape
6 pepsi
Name: product, dtype: object
New df
df['product']=df.groupby('type').product.transform(lambda x: x.fillna(x.mode()[0]))
df
Out[40]:
price product type
0 25 coca softdrink
1 94 grape juice
2 57 pepsi softdrink
3 62 pepsi softdrink
4 70 orange juice
5 50 grape juice
6 60 pepsi softdrink
Upvotes: 3