Reputation: 384
I can't wrap my head around solving this problem without itertuples
I want to take mean values from every row that the cumulative adds up to less than 1/3rd of total sum of a different column
Starting dataframe:
df = pd.DataFrame({'model_1': [0.15, 0.19, 0.25, 0.54, 0.55 , 0.98, 1.12],
'model_2': [0.12, 0.13, 0.32, 0.45, 0.6 , 0.7, 1.05],
'exposure': [0.4, 1, 1.6, 1, 2, 2, 3],
'target': [0.1, 0.2, 0.3, 0.4, 0.5, 0.8, 1.1]})
Here we see that the sum of the exposure is 11, and my intention is to make 3 buckets, take the mean value of all of the rows that have a cumulative sum less than or equal to 1/3 of the total exposure
So we can see that the first 4 rows have a cumulative sum of 4, then I want to take a relative mean of those columns.
This means the first value in aggr_model_1 is:
((0.15 * 0.4) + (0.19 * 1) + (0.25 * 1.6) + 0.54)/4 = 0.2975
This same process is then applied for aggr_model_2 and aggr_target
Output dataframe:
output_df = pd.DataFrame({'aggr_model_1': [0.2975, 0.765, 1.12],
'aggr_model_2': [0.285, 0.65, 1.05],
'aggr_exposure': [4, 4, 3],
'aggr_target': [0.28, 0.65, 1.1]})
Upvotes: 0
Views: 202
Reputation: 1842
I'll take a shot, see if I understood it right. Ingredients of this computation are:
total = df.exposure.sum()
bins = np.linspace(0, total, 4)
cum_exposure = df.exposure.cumsum()
bin_cum_exposure = pd.cut(cum_exposure, bins)
w_model_1 = df.exposure * df.model_1
df.groupby('bin_cum_exposure').w_model_1.mean()
Putting things together:
total = df.exposure.sum()
bins = np.linspace(0, total, 4)
(df.assign(bin_cum_exposure = lambda x: pd.cut(x.exposure.cumsum(), bins),
w_model_1 = lambda x: x.exposure * x.model_1,
w_model_2 = lambda x: x.exposure * x.model_2,
w_total = lambda x: x.exposure * x.target)
.groupby('bin_cum_exposure')
.mean()
)
The answer differs from your manual computation, as the first bin has 3 elements rather than 4 as in your example.
Upvotes: 1