Keep only values, grouped by a column, which share a value in another column

Question

I hope the title isn't too confusing.

I have a dataframe, and I'm trying to keep only the values, grouped by 'id', that all share the same value in 'validity'. In other words, the three rows with id=0 must be kept only if the corresponding 3 values in validity are the same (either all three are 0, or all three are 1).

Yet again in other words, if the three values that share an id don't all share the same validity, filter them out.

import pandas as pd
import numpy as np

a = np.repeat(np.arange(6), 3)
b = np.random.choice([0, 1], p=[.15, .85], size=18)
c = np.repeat(['dog', 'cat', 'platypus'], 6)

df = pd.DataFrame(zip(a, b, c), columns=['id', 'validity', 'kind'])

    id  validity      kind
0    0         1       dog
1    0         1       dog
2    0         0       dog
3    1         1       dog # only keep those
4    1         1       dog # only keep those
5    1         1       dog # only keep those
6    2         1       cat
7    2         0       cat
8    2         1       cat
9    3         1       cat
10   3         1       cat
11   3         0       cat
12   4         1  platypus
13   4         0  platypus
14   4         1  platypus
15   5         1  platypus
16   5         0  platypus
17   5         1  platypus

Jonas · Accepted Answer

You can filter with nunique and transform:

df = df[df.groupby(['id', 'kind'])['validity'].transform('nunique') == 1]

Keep only values, grouped by a column, which share a value in another column

Answers (1)

Related Questions