Reputation: 36604
I hope the title isn't too confusing.
I have a dataframe, and I'm trying to keep only the values, grouped by 'id'
, that all share the same value in 'validity'
. In other words, the three rows with id=0
must be kept only if the corresponding 3 values in validity
are the same (either all three are 0, or all three are 1).
Yet again in other words, if the three values that share an id
don't all share the same validity
, filter them out.
import pandas as pd
import numpy as np
a = np.repeat(np.arange(6), 3)
b = np.random.choice([0, 1], p=[.15, .85], size=18)
c = np.repeat(['dog', 'cat', 'platypus'], 6)
df = pd.DataFrame(zip(a, b, c), columns=['id', 'validity', 'kind'])
id validity kind
0 0 1 dog
1 0 1 dog
2 0 0 dog
3 1 1 dog # only keep those
4 1 1 dog # only keep those
5 1 1 dog # only keep those
6 2 1 cat
7 2 0 cat
8 2 1 cat
9 3 1 cat
10 3 1 cat
11 3 0 cat
12 4 1 platypus
13 4 0 platypus
14 4 1 platypus
15 5 1 platypus
16 5 0 platypus
17 5 1 platypus
Upvotes: 1
Views: 81
Reputation: 1769
You can filter with nunique
and transform
:
df = df[df.groupby(['id', 'kind'])['validity'].transform('nunique') == 1]
Upvotes: 2