xro7
xro7

Reputation: 751

How to handle Nan values when computing weighted mean

I have a weights series like below:

a  0.2
b  0.3
c  0.5

and a dataframe:

   a    b   c
1  1    2   2
2  Nan  2   2
3  Nan  1   Nan
...

I want to compute the weighted mean of the dataframe like this:

(dataframe * weights).sum(axis=1) 

The problem is that when the value of the dataframe is Nan i want the corresponding weight to be added equally to the other weights. For example for the second row weights should be 0.4 for b and 0.6 for c. And for the third row weight matrix should be 1 for b.

Upvotes: 0

Views: 2350

Answers (2)

IanS
IanS

Reputation: 16251

You can use numpy's average for masked arrays, designed exactly for that use case. Let s be the weight series, and df the dataframe:

np.ma.average(np.ma.array(df.values, mask=df.isnull().values), 
              weights=s.values, axis=1)

The .data property contains the result:

array([ 1.8,  2. ,  1. ])

Edit: as suggested in the comments you can convert the result to a series:

pd.Series(np.ma.average(np.ma.array(df.values, mask=df.isnull().values), 
                        weights=s.values, axis=1).data, index=df.index)

Upvotes: 5

Maarten Fabré
Maarten Fabré

Reputation: 7058

First you can make an adjusted weights matrix

df2 = dataframe.copy()
df2[pd.notnull(df2)] = 1
df2 = df2 * weight
df2 = df2.multiply(1/df2.sum(axis=1), axis=0)
df2

results in this weigths matrix

    a   b   c
row             
1   0.2     0.300   0.500
2   NaN     0.375   0.625
3   NaN     1.000   NaN

then (df2 * dataframe).sum(axis=1) results in

row
1    1.8
2    2.0
3    1.0
dtype: float64

This can be done slightly quicker by using DafaFrame.where()

df2 = dataframe.where(pd.isnull,1) * weight
result = (df2.multiply(1 / df2.sum(axis=1), axis=0) * dataframe).sum(axis=1)

Upvotes: 4

Related Questions