Reputation: 357
I have a multi-level index pandas dataframe. I would like to create a new column where the values in this column are based on a condition. This condition is based on summing another column for that index, which is then halved. If this is less than the last value which is stored in a separate list then the value in the new column takes the same values as another column in the dataframe. If this condition is not met then all values in the new column should be 0
.
Using this question to try and achieve this Sum columns by level in a Multi-Index DataFrame I have used a combination of np.where
and df.sum(level=0, axis=1)
but this results in the following error:
ValueError: operands could not be broadcast together with shapes (2,8) (21,) ()
Here is an example of my dataframe and the code I have used thus far:
import pandas as pd
import numpy as np
balance = [1400]
data = {'EVENT_ID': [112335580,112335580,112335580,112335580,112335580,112335580,112335580,112335580, 112335582,
112335582,112335582,112335582,112335582,112335582,112335582,112335582,112335582,112335582,
112335582,112335582,112335582],
'SELECTION_ID': [6356576,2554439,2503211,6297034,4233251,2522967,5284417,7660920,8112876,7546023,8175276,8145908,
8175274,7300754,8065540,8175275,8106158,8086265,2291406,8065533,8125015],
'Pot_Bet': [3.236731,2.416966,2.278365,2.264023,2.225353,2.174407, 2.141420,2.122386,2.832997,2.411094,
2.167218,2.138972,2.132137,2.128341,2.116338,2.115239,2.115123,2.114284362,2.113420,
2.113186,2.112729],
'Liability':[3.236731, 2.416966, 12.245492, 12.795112, 15.079176, 23.336171, 50.741182, 571.003118, 2.832997, 6.691736, 15.808607, 27.935834, 35.954927, 43.275250, 147.165537, 193.017915, 199.622454, 265.809019, 405.808678, 473.926781, 706.332594]}
df = pd.DataFrame(data, columns=['EVENT_ID', 'SELECTION_ID', 'Pot_Bet','WIN_LOSE'])
df.set_index(['EVENT_ID', 'SELECTION_ID'], inplace=True) #Selecting columns for indexing
df['Bet'] = np.where(df.sum(level = 0) > 0.5*balance[-1], df['Pot_Bet'], 0)
This results in the error stated earlier.
For index 112335580
the new column should have the same values as 'Pot_Bet'
. While for the index 112335582
the new column should have values of 0
.
Cheers, Sandy
Upvotes: 2
Views: 3014
Reputation: 862771
Problem is if use df.sum(level=0)
it is same like df.groupby(level = 0).sum()
- aggregation by first level of MultiIndex
.
Solution is use GroupBy.transform
for Series
with same size like original DataFrame
:
df['Bet'] = np.where(df.groupby(level = 0)['Pot_Bet'].transform('sum') > 0.5*balance[-1],
df['Pot_Bet'], 0)
Detail:
print (df.groupby(level = 0)['Pot_Bet'].transform('sum'))
EVENT_ID SELECTION_ID
112335580 6356576 18.859651
2554439 18.859651
2503211 18.859651
6297034 18.859651
4233251 18.859651
2522967 18.859651
5284417 18.859651
7660920 18.859651
112335582 8112876 28.611078
7546023 28.611078
8175276 28.611078
8145908 28.611078
8175274 28.611078
7300754 28.611078
8065540 28.611078
8175275 28.611078
8106158 28.611078
8086265 28.611078
2291406 28.611078
8065533 28.611078
8125015 28.611078
Name: Pot_Bet, dtype: float64
If need working wit hone column only is possible select it for Series
by column name:
print (df['Pot_Bet'].sum(level=0))
EVENT_ID
112335580 18.859651
112335582 28.611078
Name: Pot_Bet, dtype: float64
print (df.groupby(level = 0)['Pot_Bet'].sum())
EVENT_ID
112335580 18.859651
112335582 28.611078
Name: Pot_Bet, dtype: float64
Upvotes: 2