Filling missing values of categorical values based on other categorical values in pandas dataframe

Question

I want to fill missing values of categorical values in Pandas data frame with the most frequent values on another category. For example,

import pandas as pd
import numpy as np
data = {'type': ['softdrink', 'juice', 'softdrink', 'softdrink',    'juice','juice','juice'],
    'product': ['coca', np.nan, 'pepsi', 'pepsi', 'orange','grape',np.nan], 
    'price': [25, 94, 57, 62, 70,50,60]}
df = pd.DataFrame(data)
df

which results in

      price     | product   |   type    
0   25          |  coca     | softdrink   
1   94          |   NaN     | juice    
2   57          |   pepsi   | softdrink    
3   62          |   pepsi   | softdrink    
4   70          |   orange  | juice    
5   50          |    grape  | juice    
6   60          |   NaN     | softdrink

First, I use groupby as

df.groupby('type')['product'].value_counts()

to get

type      |   product    
juice     |    grape  |   1    
          |   orange  |   1    
softdrink | pepsi     |   2    
          | coca      |   1    
Name: product, dtype: int64

I want to fill a missing product of second row with "pepsi" (the most infrequence) but filling "grape" for missing value of row 6 of category "juice". Without categorical group, my solution is to find most frequent value by the column and assign this value to missing value.

df['product'].fillna(df['product'].value_counts().index[0],inplace=True)

I struggle to complete the task since the return value of the command

df.groupby('type')['product'].value_counts()

is pandas series which can be accessed by

df.groupby('type')['product'].value_counts()['softdrink']['pepsi']

how I know which product+category has the most frequence.

Filling missing values of categorical values based on other categorical values in pandas dataframe

Answers (1)

Related Questions