Reputation: 210
I can do a single word search in each column but unable to search user-provided number of string search with "and" "or" option
0 1 3 4
0 [OH-] [Na+] NAN CCO
1 [OH-] [Na+] CCO Cl
This one works
search = 'CCO'
df.loc[df.isin([search]).any(axis=1)].index.tolist()
For multi-search I tried
import re
terms = ['C1', 'CCO']
p = r'\b(?:{})\b'.format('|'.join(map(re.escape, terms)))
df[df['col'].str.contains(p)]
Gives me KeyError: 'col'
Expected output
Search='C1' AND '[NA+]
Results 1
Search='CCO' OR 'C1'
Results 0 1
Upvotes: 2
Views: 111
Reputation: 4455
I created your dataframe this way:
df = pd.DataFrame( { 0 : ["[OH-]","[Na+]","NAN","CCO" ], 1 : ["[OH-]","[Na+]","CCO","Cl"] } ).transpose()
Yielding this df:
0 1 2 3
0 [OH-] [Na+] NAN CCO
1 [OH-] [Na+] CCO Cl
I observed that you can do your OR logic with the isin() function on the df:
df.isin(['CCO','C1'])
Yields:
0 1 2 3
0 False False False True
1 False False True False
And so you can figure out which rows match using any(1)
as you are using:
df.isin(['CCO','C1']).any(1).index.tolist()
Yields:
[0, 1]
Logic for AND:
The snippet below looks for each term individually and accumulates them in the results dataframe. After finding matching columns, the number of matches in each row is checked to see if it matches the number of terms.
results = pd.DataFrame()
terms = [ 'Cl', '[Na+]' ]
for term in terms:
if results.empty:
results = df.isin( [ term ] )
else:
results |= df.isin( [ term ] )
results['count'] = results.sum(axis=1)
print( results[ results['count'] == len( terms ) ].index.tolist() )
I know there is a better way - but this way works (I think)
The above code yields [1]
for terms = [ 'Cl', '[Na+]' ]
and [0,1]
for terms = [ "[OH-]","[Na+]" ]
.
Upvotes: 1
Reputation: 93161
Because there isn't a column name col
. Try this:
df[df.apply(lambda col: col.str.contains(p)).any(axis=1)]
col
is now the name of an input parameter to the lambda.
Upvotes: 1