Reputation: 45
Suppose I have frame that is grouped that looks like this.
q_n q_N q_f
level_id year level_name question value
0 2020 Level_1 q1_1 1.0 0.0 82.0 0.000000
2.0 0.0 82.0 0.000000
3.0 3.0 82.0 0.036585
4.0 33.0 82.0 0.402439
5.0 46.0 82.0 0.560976
q1_2 1.0 1.0 82.0 0.012195
2.0 4.0 82.0 0.048780
3.0 11.0 82.0 0.134146
4.0 34.0 82.0 0.414634
5.0 32.0 82.0 0.390244
I'd like to compute a mean for each group, so the value
times q_n
and theirs sum within each group divided by q_N
(each row within the group should get the same mean).
I tried doing something like this:
df['q_m'] = ((df['q_n']*df.index.get_level_values('value')) / (df['q_n'].groupby(['year', 'level_name', 'question']).transform('sum')))
But the ceiling in the division is not getting properly summed.
How can I achieve a column q_m
that for q1_1
has 4.52 for each row and 4.12 for each row in q1_2
?
For testing:
df.reset_index().to_records()
rec.array([(0, 0, 2020., 'level_1', 'q1_1', 1., 0., 82., 0. , 0. ),
(1, 0, 2020., 'level_1', 'q1_1', 2., 0., 82., 0. , 0. ),
(2, 0, 2020., 'level_1', 'q1_1', 3., 3., 82., 0.03658537, 0.1097561 ),
(3, 0, 2020., 'level_1', 'q1_1', 4., 33., 82., 0.40243902, 1.6097561 ),
(4, 0, 2020., 'level_1', 'q1_1', 5., 46., 82., 0.56097561, 2.80487805),
(5, 0, 2020., 'level_1', 'q1_2', 1., 1., 82., 0.01219512, 0.01219512),
(6, 0, 2020., 'level_1', 'q1_2', 2., 4., 82., 0.04878049, 0.09756098),
(7, 0, 2020., 'level_1', 'q1_2', 3., 11., 82., 0.13414634, 0.40243902),
(8, 0, 2020., 'level_1', 'q1_2', 4., 34., 82., 0.41463415, 1.65853659),
(9, 0, 2020., 'level_1', 'q1_2', 5., 32., 82., 0.3902439 , 1.95121951)],
dtype=[('index', '<i8'), ('level_id', '<i8'), ('year', '<f8'), ('level_name', 'O'), ('question', 'O'), ('value', '<f8'), ('q_n', '<f8'), ('q_N', '<f8'), ('q_f', '<f8'), ('q_m', '<f8')])
df.to_dict()
{'q_n': {(0, 2020.0, 'Level_1', 'q1_1', 1.0): 0.0,
(0, 2020.0, 'Level_1', 'q1_1', 2.0): 0.0,
(0, 2020.0, 'Level_1', 'q1_1', 3.0): 3.0,
(0, 2020.0, 'Level_1', 'q1_1', 4.0): 33.0,
(0, 2020.0, 'Level_1', 'q1_1', 5.0): 46.0,
(0, 2020.0, 'Level_1', 'q1_2', 1.0): 1.0,
(0, 2020.0, 'Level_1', 'q1_2', 2.0): 4.0,
(0, 2020.0, 'Level_1', 'q1_2', 3.0): 11.0,
(0, 2020.0, 'Level_1', 'q1_2', 4.0): 34.0,
(0, 2020.0, 'Level_1', 'q1_2', 5.0): 32.0},
'q_N': {(0, 2020.0, 'Level_1', 'q1_1', 1.0): 82.0,
(0, 2020.0, 'Level_1', 'q1_1', 2.0): 82.0,
(0, 2020.0, 'Level_1', 'q1_1', 3.0): 82.0,
(0, 2020.0, 'Level_1', 'q1_1', 4.0): 82.0,
(0, 2020.0, 'Level_1', 'q1_1', 5.0): 82.0,
(0, 2020.0, 'Level_1', 'q1_2', 1.0): 82.0,
(0, 2020.0, 'Level_1', 'q1_2', 2.0): 82.0,
(0, 2020.0, 'Level_1', 'q1_2', 3.0): 82.0,
(0, 2020.0, 'Level_1', 'q1_2', 4.0): 82.0,
(0, 2020.0, 'Level_1', 'q1_2', 5.0): 82.0},
'q_f': {(0, 2020.0, 'Level_1', 'q1_1', 1.0): 0.0,
(0, 2020.0, 'Level_1', 'q1_1', 2.0): 0.0,
(0, 2020.0, 'Level_1', 'q1_1', 3.0): 0.036585365853658534,
(0, 2020.0, 'Level_1', 'q1_1', 4.0): 0.4024390243902439,
(0, 2020.0, 'Level_1', 'q1_1', 5.0): 0.5609756097560976,
(0, 2020.0, 'Level_1', 'q1_2', 1.0): 0.012195121951219513,
(0, 2020.0, 'Level_1', 'q1_2', 2.0): 0.04878048780487805,
(0, 2020.0, 'Level_1', 'q1_2', 3.0): 0.13414634146341464,
(0, 2020.0, 'Level_1', 'q1_2', 4.0): 0.4146341463414634,
(0, 2020.0, 'Level_1', 'q1_2', 5.0): 0.3902439024390244},
'q_m': {(0, 2020.0, 'Level_1', 'q1_1', 1.0): 0.0,
(0, 2020.0, 'Level_1', 'q1_1', 2.0): 0.0,
(0, 2020.0, 'Level_1', 'q1_1', 3.0): 0.10975609756097561,
(0, 2020.0, 'Level_1', 'q1_1', 4.0): 1.6097560975609757,
(0, 2020.0, 'Level_1', 'q1_1', 5.0): 2.8048780487804876,
(0, 2020.0, 'Level_1', 'q1_2', 1.0): 0.012195121951219513,
(0, 2020.0, 'Level_1', 'q1_2', 2.0): 0.0975609756097561,
(0, 2020.0, 'Level_1', 'q1_2', 3.0): 0.4024390243902439,
(0, 2020.0, 'Level_1', 'q1_2', 4.0): 1.6585365853658536,
(0, 2020.0, 'Level_1', 'q1_2', 5.0): 1.951219512195122}}
Upvotes: 0
Views: 49
Reputation: 150745
Seems you transform the wrong series:
df['q_m'] = ((df['q_n']*df.index.get_level_values('value'))
.groupby(['year','level_name','question'])
.transform('sum')
.div(df['q_N'])
)
Output:
q_n q_N q_f q_m
level_id year level_name question value
0 2020.0 Level_1 q1_1 1.0 0.0 82.0 0.000000 4.524390
2.0 0.0 82.0 0.000000 4.524390
3.0 3.0 82.0 0.036585 4.524390
4.0 33.0 82.0 0.402439 4.524390
5.0 46.0 82.0 0.560976 4.524390
q1_2 1.0 1.0 82.0 0.012195 4.121951
2.0 4.0 82.0 0.048780 4.121951
3.0 11.0 82.0 0.134146 4.121951
4.0 34.0 82.0 0.414634 4.121951
5.0 32.0 82.0 0.390244 4.121951
Upvotes: 1