Reputation: 3375
I have a very simple search string. I am looking for a shop called "Lidl".
My dataframe:
term_location amount
0 Lidl 2.28
1 Lidl 16.97
2 Lidl 2.28
3 Lidl 16.97
4 Lidl 16.97
5 Lidl 16.97
6 Lidl 16.97
7 Lidl 16.97
8 Lidl 16.97
9 Lidl 16.97
Here I am searching for a regex version of Lidl:
r = r'\blidl\b'
r = re.compile(r)
df[df.term_location.str.contains(r,re.IGNORECASE,na=False)]
This brings back an empty dataframe.
However if I just put the simple string in str.contains()
it works and I get the the dataframe of Lidls returned:
df[df.term_location.str.contains('lidl',case=False,na=False)]
I would prefer to be able to use regex, as I have a few more conditions to build into the query.
So what's happening? I can't figure it out.
Practice dataframe for pd.DataFrame.from_dict()
:
{'term_location': {0: 'Lidl',
1: 'Lidl',
2: 'Lidl',
3: 'Lidl',
4: 'Lidl',
5: 'Lidl',
6: 'Lidl',
7: 'Lidl',
8: 'Lidl',
9: 'Lidl'},
'amount': {0: 2.28,
1: 16.97,
2: 2.28,
3: 16.97,
4: 16.97,
5: 16.97,
6: 16.97,
7: 16.97,
8: 16.97,
9: 16.97}}
Upvotes: 2
Views: 1397
Reputation: 18611
Use string literal as pattern argument, it will be parsed as a regular expression:
df[df.term_location.str.contains(r'\blidl\b',case=False,na=False)]
^^^^^^^^^
The case=False
will act identically to re.IGNORECASE
.
Alternatively, use (?i)
:
df[df.term_location.str.contains(r'(?i)\blidl\b',na=False)]
Upvotes: 1
Reputation:
Your regular expression is not working because you are trying to match the word "lidl" exactly as it is (in lowercase).
You should either change the first character of the word to uppercase:
re.compile(r"\bLidl\b")
or use the re.IGNORECASE
flag in order to match the word regardless its case:
re.compile(r"\blidl\b", re.IGNORECASE)
Keep in mind that \b
tries to match the word in the beginning of the text. For example, "_Lidl" wouldn't match any of the regular expressions above.
Upvotes: 2