Grendel
Grendel

Reputation: 783

Groupby and extract groups containing only value with a pattern

I have a dataframe such as :

COL1 COL2 
G1 AHA_(+)jjd
G1 6EGEGUG
G1 897E97eh
G1 77E97E
G2 8JHEJE_(-)
G2 8JHEJE_(+)
G3 TTTD
G3 YYYDD
G4 DTTDHD
G4 DYD
G5 tTDHD(+)
G6 DGDGGD

and I would like to add the Gnumber into a list for those that contain only COL2 value containing the pattern ().

her I should get

expected output :

print(list)

[G2,G5]

thanks for your help

Upvotes: 1

Views: 44

Answers (1)

jezrael
jezrael

Reputation: 862711

Use Series.str.contains with negate masks by ~ and test matched values by Series.isin:

#filter values with ()
m1 = df['COL2'].str.contains(('\(.*\)'))
#filter COL1 values with no ()
m2 = df.COL1.isin(df.loc[~m1, 'COL1'])

#filter values only with ()
out = df.loc[~m2, 'COL1'].unique()
print (out)
['G2' 'G5']

Or use GroupBy.all for test if all Trues per groups and then filter index values:

s = df['COL2'].str.contains(('\(.*\)')).groupby(df['COL1']).all()

out = s.index[s]
print (out)
Index(['G2','G5'], dtype='object', name='COL1')

Upvotes: 1

Related Questions