Reputation: 165
my data frame looks like this:
ID Values
0 0
2 0
3 1
5 0
5 1
8 0
9 0
10 1
11 1
12 1
I would like to drop rows where there is only one instance of ID and the sum of value is 1 my output would look like:
ID Values
0 0
2 0
5 0
5 1
8 0
9 0
Upvotes: 2
Views: 258
Reputation: 1496
In [2]: df = pd.DataFrame({'ID': {0: 0, 1: 2, 2: 3, 3: 5, 4: 5, 5: 8, 6: 9, 7: 10, 8: 11, 9: 12},
...: 'Values': {0: 0, 1: 0, 2: 1, 3: 0, 4: 1, 5: 0, 6: 0, 7: 1, 8: 1, 9: 1}})
...: df
Out[2]:
ID Values
0 0 0
1 2 0
2 3 1
3 5 0
4 5 1
5 8 0
6 9 0
7 10 1
8 11 1
9 12 1
In [3]: df.drop(set(df[df.Values == 1].index).intersection(df.ID.drop_duplicates(keep=False).index))
Out[3]:
ID Values
0 0 0
1 2 0
3 5 0
4 5 1
5 8 0
6 9 0
Upvotes: 0
Reputation: 323326
Try filter
df.groupby('ID').filter(lambda x : (sum(x['Values'])==1) & (len(['Values'])==1))
Out[409]:
ID Values
2 3 1
3 5 0
4 5 1
7 10 1
8 11 1
9 12 1
Upvotes: 0
Reputation: 26676
df[(df.groupby('ID').Values.transform(lambda x: x.count()>(1)))|\
(df.groupby('ID').Values.transform(lambda x: x.sum()<(1)))]
ID Values
0 0 0
1 2 0
3 5 0
4 5 1
5 8 0
6 9 0
Upvotes: 0
Reputation: 25259
Try this:
df_final = df[df.groupby('ID').Values.transform('sum').ne(1) |
df.ID.duplicated(keep=False)]
Out[31]:
ID Values
0 0 0
1 2 0
3 5 0
4 5 1
5 8 0
6 9 0
Upvotes: 4
Reputation: 4929
You can calculate both count
and sum
for each ID to get the index from count==1
or sum==1
, removing them afterwards.
idx = (df.groupby('ID').agg(['count','sum']).droplevel(0, axis=1)
.apply(lambda x: x['count']==1 & x['sum']==1, axis=1))
df[df.ID.isin(idx[~idx].index)]
Output:
ID Values
0 0 0
1 2 0
3 5 0
4 5 1
5 8 0
6 9 0
Upvotes: 0