Reputation: 751
I have a weights series like below:
a 0.2
b 0.3
c 0.5
and a dataframe:
a b c
1 1 2 2
2 Nan 2 2
3 Nan 1 Nan
...
I want to compute the weighted mean of the dataframe like this:
(dataframe * weights).sum(axis=1)
The problem is that when the value of the dataframe is Nan i want the corresponding weight to be added equally to the other weights. For example for the second row weights should be 0.4 for b and 0.6 for c. And for the third row weight matrix should be 1 for b.
Upvotes: 0
Views: 2350
Reputation: 16251
You can use numpy's average for masked arrays, designed exactly for that use case. Let s
be the weight series, and df
the dataframe:
np.ma.average(np.ma.array(df.values, mask=df.isnull().values),
weights=s.values, axis=1)
The .data
property contains the result:
array([ 1.8, 2. , 1. ])
Edit: as suggested in the comments you can convert the result to a series:
pd.Series(np.ma.average(np.ma.array(df.values, mask=df.isnull().values),
weights=s.values, axis=1).data, index=df.index)
Upvotes: 5
Reputation: 7058
First you can make an adjusted weights matrix
df2 = dataframe.copy()
df2[pd.notnull(df2)] = 1
df2 = df2 * weight
df2 = df2.multiply(1/df2.sum(axis=1), axis=0)
df2
results in this weigths matrix
a b c
row
1 0.2 0.300 0.500
2 NaN 0.375 0.625
3 NaN 1.000 NaN
then (df2 * dataframe).sum(axis=1)
results in
row
1 1.8
2 2.0
3 1.0
dtype: float64
This can be done slightly quicker by using DafaFrame.where()
df2 = dataframe.where(pd.isnull,1) * weight
result = (df2.multiply(1 / df2.sum(axis=1), axis=0) * dataframe).sum(axis=1)
Upvotes: 4