Deshwal
Deshwal

Reputation: 4152

Using apply() on different columns with different functions on each column of a dataframe

I have a DataFrame which has columns name age,salary. There are some NaN values too. I want to fill those values using Mean and Median.

Original DataFrame


age salary
0   20.0    NaN
1   45.0    22323.0
2   NaN 598454.0
3   32.0    NaN
4   NaN 48454.0

Fill missing age with the mean() and salary with median() of their respective columns using apply().

I used

df['age','salary'].apply({'age':lambda row:row.fillna(row.mean()), 'salary':lambda row:row.fillna(row.median()) })

It is showing Key error 'age','salary' even after I use axis=1

Ecpected Output

    age salary
0   20.000000   48454.0
1   45.000000   22323.0
2   32.333333   598454.0
3   32.000000   48454.0
4   32.333333   48454.0

Can someone show me how to do it properly and what is happening in the background?

Please tell if there are other ways too. I am learning Pandas from scratch

Upvotes: 0

Views: 58

Answers (2)

Massifox
Massifox

Reputation: 4487

According to the documentation, the easiest way to do that you ask is to pass a dictionary as a value parameter:

value : scalar, dict, Series, or DataFrame

Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the dict/Series/DataFrame will not be filled. This value cannot be a list.

in your case the code will be next:

df.fillna(value={'age': df.age.mean(), 'salary': df.salary.median()}, inplace=True)

and gives:

         age    salary
0  20.000000   48454.0
1  32.333333   22323.0
2  45.000000  598454.0
3  32.333333   48454.0
4  32.000000   48454.0
5  32.333333   48454.0

Upvotes: 1

nimrodm
nimrodm

Reputation: 23799

How about computing the missing values before running apply? That is, compute the mean of age and the median of salary then use (note the extra [] brackets needed to operate on multiple columns)

median_salary = df['salary'].median()
mean_age = df['age'].mean()

df[['age','salary']].apply({'age': lambda r: r.fillna(mean_age), 'salary': lambda r: r.fillna(median_salary)}) 

Also note that this does not affect the dataframe but instead creates a new one so if you want to update the columns use something like:

df[['age', 'salary']] = df[['age', 'salary']].apply(...)

Or, in your case where you just want to fill in missing values, the best solution is probably:

r.fillna({'age': mean_age, 'salary': median_salary}, inplace=True)

Upvotes: 1

Related Questions