Reputation: 67
I want to drop a group (all rows in the group) if the sum of values in a group is equal to a certain value.
The following code provides an example:
>>> df = pd.DataFrame(randn(10,10), index=pd.date_range('20130101',periods=10,freq='T'))
>>> df = pd.DataFrame(df.stack(), columns=['Values'])
>>> df.index.names = ['Time', 'Group']
>>> df.head(12)
Values
Time Group
2013-01-01 00:00:00 0 0.541795
1 0.060798
2 0.074224
3 -0.006818
4 1.211791
5 -0.066994
6 -1.019984
7 -0.558134
8 2.006748
9 2.737199
2013-01-01 00:01:00 0 1.655502
1 0.376214
>>> df['Values'].groupby('Group').sum()
Group
0 3.754481
1 -5.234744
2 -2.000393
3 0.991431
4 3.930547
5 -3.137915
6 -1.260719
7 0.145757
8 -1.832132
9 4.258525
Name: Values, dtype: float64
So the question is; how can I for instance drop all group rows where the grouped sum is negative? In my actual dataset I want to drop the groups where the sum or mean is zero.
Upvotes: 1
Views: 1613
Reputation: 2515
From the pandas documentation, filtration seems more suitable:
df2 = df.groupby('Group').filter(lambda g: g['Values'].sum() >= 0)
(Old answer):
This worked for me:
# Change the index to *just* the `Group` column
df.reset_index(inplace=True)
df.set_index('Group', inplace=True)
# Then create a filter using the groupby object
gb = df['Values'].groupby('Group')
gb_sum = gb.sum()
val_filter = gb_sum[gb_sum >= 0].index
# Print results
print(df.loc[val_filter])
The condition on which you filter can be changed accordingly.
Upvotes: 0
Reputation: 164773
Using GroupBy
+ transform
with sum
, followed by Boolean indexing:
res = df[df.groupby('Group')['Values'].transform('sum') > 0]
Upvotes: 4