shahram kalantari
shahram kalantari

Reputation: 863

fill null values in a column of pandas dataframe

I have a pandas dataframe which has more than 4 columns. Some values in the col1 are missing and I want to set those missing values based on the following approach:

  1. try to set it based on the average of values of col1 of the records that have the same col2,col3,col4 values
  2. if there is no such record, set it based on the average of values of col1 of the records that have the same col2,col3 values
  3. if there is still no such record, set it based on the average of values of col1 of the records that have the same col2 values
  4. If none of the above could be found, set it to the average of all other non-missing values in col1

What's the best way to do this?

Upvotes: 0

Views: 1311

Answers (2)

Dinesh vishe
Dinesh vishe

Reputation: 3608

--- filling null null value with zero

df_with_dummies.fillna(value = 0, inplace = True)

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150805

Based on your logic, you can do something as follows, where each row of fillna corresponds to a bullet point in your question, in the same order:

df['col1'] = (df['col1']
               .fillna(df.groupby(['col2','col3','col4'])['col1'].transform('mean'))
               .fillna(df.groupby(['col2','col3'])['col1'].transform('mean'))
               .fillna(df.groupby(['col2'])['col1'].transform('mean')
               .fillna(df['col1'].mean())
             )

Upvotes: 2

Related Questions