How to use the sum values from a column in a multi-level indexed pandas dataframe as a condition for values in new column

Question

I have a multi-level index pandas dataframe. I would like to create a new column where the values in this column are based on a condition. This condition is based on summing another column for that index, which is then halved. If this is less than the last value which is stored in a separate list then the value in the new column takes the same values as another column in the dataframe. If this condition is not met then all values in the new column should be 0.

Using this question to try and achieve this Sum columns by level in a Multi-Index DataFrame I have used a combination of np.where and df.sum(level=0, axis=1) but this results in the following error:

ValueError: operands could not be broadcast together with shapes (2,8) (21,) ()

Here is an example of my dataframe and the code I have used thus far:

import pandas as pd
import numpy as np

balance = [1400]

data = {'EVENT_ID': [112335580,112335580,112335580,112335580,112335580,112335580,112335580,112335580, 112335582,
                     112335582,112335582,112335582,112335582,112335582,112335582,112335582,112335582,112335582,
                     112335582,112335582,112335582],

 'SELECTION_ID': [6356576,2554439,2503211,6297034,4233251,2522967,5284417,7660920,8112876,7546023,8175276,8145908,
                  8175274,7300754,8065540,8175275,8106158,8086265,2291406,8065533,8125015],

 'Pot_Bet': [3.236731,2.416966,2.278365,2.264023,2.225353,2.174407, 2.141420,2.122386,2.832997,2.411094,
         2.167218,2.138972,2.132137,2.128341,2.116338,2.115239,2.115123,2.114284362,2.113420,
         2.113186,2.112729],

  'Liability':[3.236731, 2.416966, 12.245492, 12.795112, 15.079176, 23.336171, 50.741182, 571.003118, 2.832997, 6.691736, 15.808607, 27.935834, 35.954927, 43.275250, 147.165537, 193.017915, 199.622454, 265.809019, 405.808678, 473.926781, 706.332594]}

df = pd.DataFrame(data, columns=['EVENT_ID', 'SELECTION_ID', 'Pot_Bet','WIN_LOSE'])

df.set_index(['EVENT_ID', 'SELECTION_ID'], inplace=True) #Selecting columns for indexing

df['Bet'] = np.where(df.sum(level = 0) > 0.5*balance[-1], df['Pot_Bet'], 0)

This results in the error stated earlier.

For index 112335580 the new column should have the same values as 'Pot_Bet'. While for the index 112335582 the new column should have values of 0.

Cheers, Sandy

jezrael · Accepted Answer

Problem is if use df.sum(level=0) it is same like df.groupby(level = 0).sum() - aggregation by first level of MultiIndex.

Solution is use GroupBy.transform for Series with same size like original DataFrame:

df['Bet'] = np.where(df.groupby(level = 0)['Pot_Bet'].transform('sum') > 0.5*balance[-1], 
                     df['Pot_Bet'], 0)

Detail:

print (df.groupby(level = 0)['Pot_Bet'].transform('sum'))
EVENT_ID   SELECTION_ID
112335580  6356576         18.859651
           2554439         18.859651
           2503211         18.859651
           6297034         18.859651
           4233251         18.859651
           2522967         18.859651
           5284417         18.859651
           7660920         18.859651
112335582  8112876         28.611078
           7546023         28.611078
           8175276         28.611078
           8145908         28.611078
           8175274         28.611078
           7300754         28.611078
           8065540         28.611078
           8175275         28.611078
           8106158         28.611078
           8086265         28.611078
           2291406         28.611078
           8065533         28.611078
           8125015         28.611078
Name: Pot_Bet, dtype: float64

If need working wit hone column only is possible select it for Series by column name:

print (df['Pot_Bet'].sum(level=0))
EVENT_ID
112335580    18.859651
112335582    28.611078
Name: Pot_Bet, dtype: float64

print (df.groupby(level = 0)['Pot_Bet'].sum())
EVENT_ID
112335580    18.859651
112335582    28.611078
Name: Pot_Bet, dtype: float64

How to use the sum values from a column in a multi-level indexed pandas dataframe as a condition for values in new column

Answers (1)

Related Questions