Jason
Jason

Reputation: 165

how can I drop rows in pandas where count and sum are 1

my data frame looks like this:

ID      Values
0        0
2        0
3        1
5        0
5        1
8        0
9        0
10       1
11       1
12       1

I would like to drop rows where there is only one instance of ID and the sum of value is 1 my output would look like:

ID      Values
0        0
2        0
5        0
5        1
8        0
9        0

Upvotes: 2

Views: 258

Answers (5)

Amir saleem
Amir saleem

Reputation: 1496

In [2]: df = pd.DataFrame({'ID': {0: 0, 1: 2, 2: 3, 3: 5, 4: 5, 5: 8, 6: 9, 7: 10, 8: 11, 9: 12},
   ...:  'Values': {0: 0, 1: 0, 2: 1, 3: 0, 4: 1, 5: 0, 6: 0, 7: 1, 8: 1, 9: 1}})
   ...: df
Out[2]: 
   ID  Values
0   0       0
1   2       0
2   3       1
3   5       0
4   5       1
5   8       0
6   9       0
7  10       1
8  11       1
9  12       1

In [3]: df.drop(set(df[df.Values == 1].index).intersection(df.ID.drop_duplicates(keep=False).index))
Out[3]: 
   ID  Values
0   0       0
1   2       0
3   5       0
4   5       1
5   8       0
6   9       0

Upvotes: 0

BENY
BENY

Reputation: 323326

Try filter

df.groupby('ID').filter(lambda x : (sum(x['Values'])==1) & (len(['Values'])==1))
Out[409]: 
   ID  Values
2   3       1
3   5       0
4   5       1
7  10       1
8  11       1
9  12       1

Upvotes: 0

wwnde
wwnde

Reputation: 26676

df[(df.groupby('ID').Values.transform(lambda x: x.count()>(1)))|\
 (df.groupby('ID').Values.transform(lambda x: x.sum()<(1)))]



 ID  Values
0   0       0
1   2       0
3   5       0
4   5       1
5   8       0
6   9       0

Upvotes: 0

Andy L.
Andy L.

Reputation: 25259

Try this:

df_final = df[df.groupby('ID').Values.transform('sum').ne(1) | 
              df.ID.duplicated(keep=False)]


Out[31]:
   ID  Values
0   0       0
1   2       0
3   5       0
4   5       1
5   8       0
6   9       0

Upvotes: 4

You can calculate both count and sum for each ID to get the index from count==1 or sum==1, removing them afterwards.

idx = (df.groupby('ID').agg(['count','sum']).droplevel(0, axis=1)
       .apply(lambda x: x['count']==1 & x['sum']==1, axis=1))
df[df.ID.isin(idx[~idx].index)]

Output:

   ID  Values
0   0       0
1   2       0
3   5       0
4   5       1
5   8       0
6   9       0

Upvotes: 0

Related Questions