Reputation: 7733
I have a dataframe which can be created from the code given below
df2= pd.DataFrame({'level_0': ['No case
notes','Notes','1.Chinese','2.Widowed','No']})
It looks like as shown below
I also have an input list which is given below
input_terms = ['No','Widowed','Chinese']
I would like to search these terms in dataframe and get their index.
How can I get my output to be like this
[4,3,2] - #This is the output index list from dataframe for my input terms
As you can see, I don't want the result set include the terms 'No case notes','Notes' though they contains 'No' as part of its string - Here I am doing a exact match
But for the input terms 'Chinese' and 'Widowed', I want the result set to include '1.Chinese' and '2.Widowed' - Here I am interested in something like str.contains method
How can I apply a mix of exact and regex/str.contains approach to search a string?
Upvotes: 1
Views: 139
Reputation: 863236
If order of index values is not important:
df2= pd.DataFrame({'level_0': ['No case notes','notes','1.Chinese','2.Widowed','No']})
input_terms = ['No','Widowed','Chinese']
pat = '|'.join(r"\d+\.{}$".format(x) for x in input_terms)
m1 = df2['level_0'].str.contains(pat)
m2 = df2['level_0'].isin(input_terms)
idx = df2.index[m1 | m2]
print (idx)
Int64Index([2, 3, 4], dtype='int64')
If order is important:
input_terms = ['No','Widowed','Chinese']
out = []
for x in input_terms:
a = df2.index[df2['level_0'] == x]
b = df2.index[df2['level_0'].str.contains(r'\d+\.{}$'.format(x))]
print (out)
[4, 3, 2]
Upvotes: 2
Reputation: 272895
Try this regex:
^[^a-zA-Z]*XXX[^a-zA-Z]*$
replace XXX
with the search terms (remember to escape them!). For example:
^[^a-zA-Z]*(?:Chinese|No|Widowed)[^a-zA-Z]*$
This is kind of a mix between str.contains
and exact matches. It will basically ignore certain characters (in this case, everything that is not a-zA-Z), and do an exact match. If you want to ignore a different set of characters, just change the two character classes at the two ends. For example, if you want to ignore spaces as well:
^[^a-zA-Z\s]*XXX[^a-zA-Z\s]*$
Upvotes: 2