destinychoice
destinychoice

Reputation: 45

pandas multiply multiindex with another column

Suppose I have frame that is grouped that looks like this.

                                                  q_n   q_N       q_f
level_id year   level_name       question value                      
0        2020   Level_1          q1_1     1.0     0.0  82.0  0.000000
                                          2.0     0.0  82.0  0.000000
                                          3.0     3.0  82.0  0.036585
                                          4.0    33.0  82.0  0.402439
                                          5.0    46.0  82.0  0.560976
                                 q1_2     1.0     1.0  82.0  0.012195
                                          2.0     4.0  82.0  0.048780
                                          3.0    11.0  82.0  0.134146
                                          4.0    34.0  82.0  0.414634
                                          5.0    32.0  82.0  0.390244

I'd like to compute a mean for each group, so the value times q_n and theirs sum within each group divided by q_N (each row within the group should get the same mean).

I tried doing something like this:

df['q_m'] = ((df['q_n']*df.index.get_level_values('value')) / (df['q_n'].groupby(['year', 'level_name', 'question']).transform('sum')))

But the ceiling in the division is not getting properly summed. How can I achieve a column q_m that for q1_1 has 4.52 for each row and 4.12 for each row in q1_2?

For testing:

df.reset_index().to_records()

rec.array([(0, 0, 2020., 'level_1', 'q1_1', 1.,  0., 82., 0.        , 0.        ),
           (1, 0, 2020., 'level_1', 'q1_1', 2.,  0., 82., 0.        , 0.        ),
           (2, 0, 2020., 'level_1', 'q1_1', 3.,  3., 82., 0.03658537, 0.1097561 ),
           (3, 0, 2020., 'level_1', 'q1_1', 4., 33., 82., 0.40243902, 1.6097561 ),
           (4, 0, 2020., 'level_1', 'q1_1', 5., 46., 82., 0.56097561, 2.80487805),
           (5, 0, 2020., 'level_1', 'q1_2', 1.,  1., 82., 0.01219512, 0.01219512),
           (6, 0, 2020., 'level_1', 'q1_2', 2.,  4., 82., 0.04878049, 0.09756098),
           (7, 0, 2020., 'level_1', 'q1_2', 3., 11., 82., 0.13414634, 0.40243902),
           (8, 0, 2020., 'level_1', 'q1_2', 4., 34., 82., 0.41463415, 1.65853659),
           (9, 0, 2020., 'level_1', 'q1_2', 5., 32., 82., 0.3902439 , 1.95121951)],
          dtype=[('index', '<i8'), ('level_id', '<i8'), ('year', '<f8'), ('level_name', 'O'), ('question', 'O'), ('value', '<f8'), ('q_n', '<f8'), ('q_N', '<f8'), ('q_f', '<f8'), ('q_m', '<f8')])

df.to_dict()

{'q_n': {(0, 2020.0, 'Level_1', 'q1_1', 1.0): 0.0,
  (0, 2020.0, 'Level_1', 'q1_1', 2.0): 0.0,
  (0, 2020.0, 'Level_1', 'q1_1', 3.0): 3.0,
  (0, 2020.0, 'Level_1', 'q1_1', 4.0): 33.0,
  (0, 2020.0, 'Level_1', 'q1_1', 5.0): 46.0,
  (0, 2020.0, 'Level_1', 'q1_2', 1.0): 1.0,
  (0, 2020.0, 'Level_1', 'q1_2', 2.0): 4.0,
  (0, 2020.0, 'Level_1', 'q1_2', 3.0): 11.0,
  (0, 2020.0, 'Level_1', 'q1_2', 4.0): 34.0,
  (0, 2020.0, 'Level_1', 'q1_2', 5.0): 32.0},
 'q_N': {(0, 2020.0, 'Level_1', 'q1_1', 1.0): 82.0,
  (0, 2020.0, 'Level_1', 'q1_1', 2.0): 82.0,
  (0, 2020.0, 'Level_1', 'q1_1', 3.0): 82.0,
  (0, 2020.0, 'Level_1', 'q1_1', 4.0): 82.0,
  (0, 2020.0, 'Level_1', 'q1_1', 5.0): 82.0,
  (0, 2020.0, 'Level_1', 'q1_2', 1.0): 82.0,
  (0, 2020.0, 'Level_1', 'q1_2', 2.0): 82.0,
  (0, 2020.0, 'Level_1', 'q1_2', 3.0): 82.0,
  (0, 2020.0, 'Level_1', 'q1_2', 4.0): 82.0,
  (0, 2020.0, 'Level_1', 'q1_2', 5.0): 82.0},
 'q_f': {(0, 2020.0, 'Level_1', 'q1_1', 1.0): 0.0,
  (0, 2020.0, 'Level_1', 'q1_1', 2.0): 0.0,
  (0, 2020.0, 'Level_1', 'q1_1', 3.0): 0.036585365853658534,
  (0, 2020.0, 'Level_1', 'q1_1', 4.0): 0.4024390243902439,
  (0, 2020.0, 'Level_1', 'q1_1', 5.0): 0.5609756097560976,
  (0, 2020.0, 'Level_1', 'q1_2', 1.0): 0.012195121951219513,
  (0, 2020.0, 'Level_1', 'q1_2', 2.0): 0.04878048780487805,
  (0, 2020.0, 'Level_1', 'q1_2', 3.0): 0.13414634146341464,
  (0, 2020.0, 'Level_1', 'q1_2', 4.0): 0.4146341463414634,
  (0, 2020.0, 'Level_1', 'q1_2', 5.0): 0.3902439024390244},
 'q_m': {(0, 2020.0, 'Level_1', 'q1_1', 1.0): 0.0,
  (0, 2020.0, 'Level_1', 'q1_1', 2.0): 0.0,
  (0, 2020.0, 'Level_1', 'q1_1', 3.0): 0.10975609756097561,
  (0, 2020.0, 'Level_1', 'q1_1', 4.0): 1.6097560975609757,
  (0, 2020.0, 'Level_1', 'q1_1', 5.0): 2.8048780487804876,
  (0, 2020.0, 'Level_1', 'q1_2', 1.0): 0.012195121951219513,
  (0, 2020.0, 'Level_1', 'q1_2', 2.0): 0.0975609756097561,
  (0, 2020.0, 'Level_1', 'q1_2', 3.0): 0.4024390243902439,
  (0, 2020.0, 'Level_1', 'q1_2', 4.0): 1.6585365853658536,
  (0, 2020.0, 'Level_1', 'q1_2', 5.0): 1.951219512195122}}

Upvotes: 0

Views: 49

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150745

Seems you transform the wrong series:

df['q_m'] = ((df['q_n']*df.index.get_level_values('value'))
     .groupby(['year','level_name','question'])
     .transform('sum')
     .div(df['q_N'])
)

Output:

                                            q_n   q_N       q_f       q_m
level_id year   level_name question value                                
0        2020.0 Level_1    q1_1     1.0     0.0  82.0  0.000000  4.524390
                                    2.0     0.0  82.0  0.000000  4.524390
                                    3.0     3.0  82.0  0.036585  4.524390
                                    4.0    33.0  82.0  0.402439  4.524390
                                    5.0    46.0  82.0  0.560976  4.524390
                           q1_2     1.0     1.0  82.0  0.012195  4.121951
                                    2.0     4.0  82.0  0.048780  4.121951
                                    3.0    11.0  82.0  0.134146  4.121951
                                    4.0    34.0  82.0  0.414634  4.121951
                                    5.0    32.0  82.0  0.390244  4.121951

Upvotes: 1

Related Questions