How to handle Nan values when computing weighted mean

Question

I have a weights series like below:

a  0.2
b  0.3
c  0.5

and a dataframe:

   a    b   c
1  1    2   2
2  Nan  2   2
3  Nan  1   Nan
...

I want to compute the weighted mean of the dataframe like this:

(dataframe * weights).sum(axis=1)

The problem is that when the value of the dataframe is Nan i want the corresponding weight to be added equally to the other weights. For example for the second row weights should be 0.4 for b and 0.6 for c. And for the third row weight matrix should be 1 for b.

Maarten Fabr&#233; · Accepted Answer

First you can make an adjusted weights matrix

df2 = dataframe.copy()
df2[pd.notnull(df2)] = 1
df2 = df2 * weight
df2 = df2.multiply(1/df2.sum(axis=1), axis=0)
df2

results in this weigths matrix

    a   b   c
row             
1   0.2     0.300   0.500
2   NaN     0.375   0.625
3   NaN     1.000   NaN

then (df2 * dataframe).sum(axis=1) results in

row
1    1.8
2    2.0
3    1.0
dtype: float64

This can be done slightly quicker by using DafaFrame.where()

df2 = dataframe.where(pd.isnull,1) * weight
result = (df2.multiply(1 / df2.sum(axis=1), axis=0) * dataframe).sum(axis=1)

How to handle Nan values when computing weighted mean

Answers (2)

Related Questions