Dinosaurius
Dinosaurius

Reputation: 8628

Delete groups of rows based on condition

I have the following pandas dataframe:

df =

A          B       C
111-ABC    123    EEE
111-ABC    222    EEE
111-ABC    444    XXX
222-CCC    222    YYY
222-CCC    333    67T
333-DDD    123    TTT
333-DDD    123    BTB
333-DDD    444    XXX
333-DDD    555    AAA

I want to delete all groups of rows (grouped by A) that do not contain 123 in the column B.

The expected result is this one (the group of rows 222-CCC is deleted):

result =

A          B       C
111-ABC    123    EEE
111-ABC    222    EEE
111-ABC    444    XXX
333-DDD    123    TTT
333-DDD    123    BTB
333-DDD    444    AAA

How to do it? I assume that first of all I should use groupby, but how to filter out the groups of rows, not just particular rows?

result = df.groupby("A").... ??

Upvotes: 5

Views: 1202

Answers (2)

piRSquared
piRSquared

Reputation: 294248

using query

a = df.query('B == 123').A.unique()
df.query('A in @a')

         A    B    C
0  111-ABC  123  EEE
1  111-ABC  222  EEE
2  111-ABC  444  XXX
5  333-DDD  123  TTT
6  333-DDD  123  BTB
7  333-DDD  444  XXX
8  333-DDD  555  AAA

You can include additional conditions within the first query

b = df.query('B == 123 & C == "EEE"').A.unique()
df.query('A in @b')

          A    B    C
0  111-ABC  123  EEE
1  111-ABC  222  EEE
2  111-ABC  444  XXX

If speed is important. Try this.

cond1 = df.B.values == 123
a = np.unique(df.A.values[cond1])
df.loc[df.A.isin(a)]

Upvotes: 3

akuiper
akuiper

Reputation: 214947

You can use groupby().filter() syntax:

df.groupby('A').filter(lambda g: (g.B == 123).any())

enter image description here

Upvotes: 4

Related Questions