Reputation: 127
I'm trying to perform a groupby aggregation. My dummy Dataframe looks like this:
print (df)
ID Industry Value 1 Value 2
0 1 Finance 0.25 99
1 1 Finance 0.50 73
2 1 Finance 0.25 53
3 1 Teaching 0.75 80
4 1 Teaching 0.25 78
5 1 Teaching 0.50 99
6 2 Finance 0.50 75
7 2 Finance 0.25 56
8 2 Finance 0.25 80
9 2 Teaching 0.50 79
10 3 Finance 0.25 61
11 3 Finance 0.75 87
12 3 Finance 0.75 97
13 3 Finance 0.25 99
14 3 Finance 0.25 76
15 3 Teaching 0.25 73
16 3 Teaching 0.75 68
17 3 Teaching 0.25 59
18 3 Teaching 0.25 60
I want to group by ID and Industry and I want to create a new field called say "Expected". Expected will be equal to:
I'd like to avoid a loop if possible. Any help would appreciated as multiple attempts using iloc, groupby agg, groupby transform have come up short for me.
Upvotes: 2
Views: 204
Reputation: 862511
First set new column by numpy.where
and duplicated
and then use DataFrameGroupBy.cumsum
:
m = df.duplicated(['ID','Industry'])
df['new'] = np.where(m, -df['Value 1'] * df['Value 2'], df['Value 1'] + df['Value 2'])
df['new'] = df.groupby(['ID','Industry'])['new'].cumsum()
print (df)
ID Industry Value 1 Value 2 new
0 1 Finance 0.25 99 99.25
1 1 Finance 0.50 73 62.75
2 1 Finance 0.25 53 49.50
3 1 Teaching 0.75 80 80.75
4 1 Teaching 0.25 78 61.25
5 1 Teaching 0.50 99 11.75
6 2 Finance 0.50 75 75.50
7 2 Finance 0.25 56 61.50
8 2 Finance 0.25 80 41.50
9 2 Teaching 0.50 79 79.50
10 3 Finance 0.25 61 61.25
11 3 Finance 0.75 87 -4.00
12 3 Finance 0.75 97 -76.75
13 3 Finance 0.25 99 -101.50
14 3 Finance 0.25 76 -120.50
15 3 Teaching 0.25 73 73.25
16 3 Teaching 0.75 68 22.25
17 3 Teaching 0.25 59 7.50
18 3 Teaching 0.25 60 -7.50
Upvotes: 4