Reputation: 3529
For an example, I have a simple DataFrame like:
index | data1 | replace_me | agg_me | ID |
---|---|---|---|---|
1 | 100 | (+) | 25 | 1 |
2 | 200 | (-) | 35 | 2 |
3 | 200 | (+) | 45 | 2 |
4 | 300 | (+) | 55 | 3 |
5 | 400 | (+) | 10 | 4 |
6 | 400 | (+) | 10 | 4 |
7 | 400 | (-) | 10 | 4 |
8 | 400 | (-) | 10 | 4 |
I am trying to aggregate some rows together, whereby there exists a len(groupby of ID) > 1. In the cases where len(groupby ID) > 1, I am looking to:
index | data1 | replace_me | agg_me | ID | Percent |
---|---|---|---|---|---|
1 | 100 | (+) | 25 | 1 | 0 |
2 | 200 | (=) | 80 | 2 | 0.4375 |
4 | 300 | (+) | 55 | 3 | 0 |
5 | 400 | (=) | 20 | 4 | 0.5 |
6 | 400 | (=) | 20 | 4 | 0.5 |
Any help is appreciated!
Upvotes: 0
Views: 277
Reputation: 8768
Try this:
vc = df['ID'].map(df['ID'].value_counts()).gt(1)
pd.concat([df.loc[~vc],
df.loc[vc]
.groupby(['ID',df.groupby('ID').cumcount().floordiv(2)]).agg(
index = ('index','first'),
data1 = ('data1','first'),
replace_me = ('replace_me',lambda x: '(=)'),
agg_me = ('agg_me','sum'),
Percent = ('agg_me',lambda x: x.min()/x.sum()))
.reset_index(level=0)]).fillna(0).sort_values('ID').reset_index(drop=True)
Output:
index data1 replace_me agg_me ID Percent
0 1 100 (+) 25 1 0.0000
1 2 200 (=) 80 2 0.4375
2 4 300 (+) 55 3 0.0000
3 5 400 (=) 20 4 0.5000
4 7 400 (=) 20 4 0.5000
Upvotes: 2