Grendel
Grendel

Reputation: 783

Pandas : remove all groups that does not contain into a col2 one element present in the list

Hel lo, I have a df such as :

col1 col2
G1 A
G1 B
G1 C
G1 D
G2 E
G2 F
G2 G
G3 H
G4 I
G4 J
G4 K

and a liste=['A','I','K']

and I would like to remove all groups that does not contain into the col2 one element present in the liste.

Here I should keep only G1 and G4 and get :

 col1 col2
    G1 A
    G1 B
    G1 C
    G1 D
    G4 I
    G4 J
    G4 K

Does someone have in idea ?

Upvotes: 1

Views: 59

Answers (2)

Erfan
Erfan

Reputation: 42916

Using isin, GroupBy.transform and any

First we use isin to check which rows contain an element from your liste. Then we GroupBy on col1 and check if any of the rows in a group contain an element of the list`

The reason we use transform here over simple GroupBy.any is because we want to get a vector back, with the same length as your dataframe, to do row wise comparison.

df[df['col2'].isin(liste).groupby(df['col1']).transform('any')]

   col1 col2
0    G1    A
1    G1    B
2    G1    C
3    G1    D
8    G4    I
9    G4    J
10   G4    K

Upvotes: 3

Serge Ballesta
Serge Ballesta

Reputation: 149075

You could use groupby and apply:

df.groupby('col1').apply(lambda x: x if any(i in x['col2'].values for i in liste)
                                            else None).reset_index(level=0, drop=True)

It gives:

   col1 col2
0    G1    A
1    G1    B
2    G1    C
3    G1    D
8    G4    I
9    G4    J
10   G4    K

Upvotes: 1

Related Questions