Reputation: 3432
Hello I have a df such as
COL1 Col2
G1 SP1.3
G1 SP2.3
G1 SP6.2
G1 SP4_4
G1 SP4_2
G1 SP8_2
G2 SP3_2
G2 SP1_3
G2 SP2_4
G2 SP2.2
how can I only subset groupbs (in COL1) where COL2 contains (SP1,SP2 and SP4 occurences ? )
here I should only get the all G1 groups :
COL1 Col2
G1 SP1.3
G1 SP2.3
G1 SP6.2
G1 SP4_4
G1 SP4_2
G1 SP8_2
Upvotes: 1
Views: 30
Reputation: 603
I did the following for G1 (SP8 is not included):
df.loc[(df['COL1']== 'G1') & (df['Col2'].str.contains('SP1|SP2|SP4'))]
df
COL1 Col2
0 G1 SP1.3
1 G1 SP2.3
3 G1 SP4_4
4 G1 SP4_2
Upvotes: 1
Reputation: 863166
Solution with compare sets per groups with get values before .
or _
by Series.str.split
with custom function in GroupBy.transform
:
a = ['SP1','SP2','SP4']
f = lambda x: set(x) >= set(a)
m = df['Col2'].str.split('\.|_').str[0].groupby(df['COL1']).transform(f)
df = df[m]
print (df)
COL1 Col2
0 G1 SP1.3
1 G1 SP2.3
2 G1 SP6.2
3 G1 SP4_4
4 G1 SP4_2
5 G1 SP8_2
EDIT: Solution with Series.str.extract
for get values by list:
a = ['SP1','SP2','SP4']
f = lambda x: set(x) >= set(a)
m = df['Col2'].str.extract(f'({"|".join(a)})',expand=False).groupby(df['COL1']).transform(f)
df = df[m]
Upvotes: 1