Reputation: 783
I have a dataframe such as :
COL1 COL2
G1 AHA_(+)jjd
G1 6EGEGUG
G1 897E97eh
G1 77E97E
G2 8JHEJE_(-)
G2 8JHEJE_(+)
G3 TTTD
G3 YYYDD
G4 DTTDHD
G4 DYD
G5 tTDHD(+)
G6 DGDGGD
and I would like to add the Gnumber into a list for those that contain only COL2 value containing the pattern ()
.
her I should get
expected output :
print(list)
[G2,G5]
thanks for your help
Upvotes: 1
Views: 44
Reputation: 862711
Use Series.str.contains
with negate masks by ~
and test matched values by Series.isin
:
#filter values with ()
m1 = df['COL2'].str.contains(('\(.*\)'))
#filter COL1 values with no ()
m2 = df.COL1.isin(df.loc[~m1, 'COL1'])
#filter values only with ()
out = df.loc[~m2, 'COL1'].unique()
print (out)
['G2' 'G5']
Or use GroupBy.all
for test if all True
s per groups and then filter index values:
s = df['COL2'].str.contains(('\(.*\)')).groupby(df['COL1']).all()
out = s.index[s]
print (out)
Index(['G2','G5'], dtype='object', name='COL1')
Upvotes: 1