chippycentra
chippycentra

Reputation: 3432

Subset within groups if column contains list of elements

Hello I have a df such as

COL1 Col2  
G1    SP1.3
G1    SP2.3
G1    SP6.2
G1    SP4_4
G1    SP4_2
G1    SP8_2
G2    SP3_2
G2    SP1_3
G2    SP2_4
G2    SP2.2

how can I only subset groupbs (in COL1) where COL2 contains (SP1,SP2 and SP4 occurences ? )

here I should only get the all G1 groups :

COL1 Col2  
G1    SP1.3
G1    SP2.3
G1    SP6.2
G1    SP4_4
G1    SP4_2
G1    SP8_2

Upvotes: 1

Views: 30

Answers (2)

jlb_gouveia
jlb_gouveia

Reputation: 603

I did the following for G1 (SP8 is not included):

df.loc[(df['COL1']== 'G1') & (df['Col2'].str.contains('SP1|SP2|SP4'))]
df

    COL1    Col2
0   G1      SP1.3
1   G1      SP2.3
3   G1      SP4_4
4   G1      SP4_2

Upvotes: 1

jezrael
jezrael

Reputation: 863166

Solution with compare sets per groups with get values before . or _ by Series.str.split with custom function in GroupBy.transform:

a = ['SP1','SP2','SP4']
f = lambda x: set(x) >= set(a)
m = df['Col2'].str.split('\.|_').str[0].groupby(df['COL1']).transform(f)
df = df[m]
print (df)
  COL1   Col2
0   G1  SP1.3
1   G1  SP2.3
2   G1  SP6.2
3   G1  SP4_4
4   G1  SP4_2
5   G1  SP8_2

EDIT: Solution with Series.str.extract for get values by list:

a = ['SP1','SP2','SP4']
f = lambda x: set(x) >= set(a)
m = df['Col2'].str.extract(f'({"|".join(a)})',expand=False).groupby(df['COL1']).transform(f)
df = df[m]

Upvotes: 1

Related Questions