SCool
SCool

Reputation: 3375

Why isn't my regex working with str.contains?

I have a very simple search string. I am looking for a shop called "Lidl".

My dataframe:

  term_location  amount
0          Lidl    2.28
1          Lidl   16.97
2          Lidl    2.28
3          Lidl   16.97
4          Lidl   16.97
5          Lidl   16.97
6          Lidl   16.97
7          Lidl   16.97
8          Lidl   16.97
9          Lidl   16.97

Here I am searching for a regex version of Lidl:

r = r'\blidl\b'

r = re.compile(r)


df[df.term_location.str.contains(r,re.IGNORECASE,na=False)]

This brings back an empty dataframe.

However if I just put the simple string in str.contains() it works and I get the the dataframe of Lidls returned:

df[df.term_location.str.contains('lidl',case=False,na=False)]

I would prefer to be able to use regex, as I have a few more conditions to build into the query.

So what's happening? I can't figure it out.

Practice dataframe for pd.DataFrame.from_dict():

{'term_location': {0: 'Lidl',
  1: 'Lidl',
  2: 'Lidl',
  3: 'Lidl',
  4: 'Lidl',
  5: 'Lidl',
  6: 'Lidl',
  7: 'Lidl',
  8: 'Lidl',
  9: 'Lidl'},
 'amount': {0: 2.28,
  1: 16.97,
  2: 2.28,
  3: 16.97,
  4: 16.97,
  5: 16.97,
  6: 16.97,
  7: 16.97,
  8: 16.97,
  9: 16.97}}

Upvotes: 2

Views: 1397

Answers (2)

Ryszard Czech
Ryszard Czech

Reputation: 18611

Use string literal as pattern argument, it will be parsed as a regular expression:

df[df.term_location.str.contains(r'\blidl\b',case=False,na=False)]
                                   ^^^^^^^^^ 

The case=False will act identically to re.IGNORECASE.

Alternatively, use (?i):

df[df.term_location.str.contains(r'(?i)\blidl\b',na=False)]

Upvotes: 1

user13893607
user13893607

Reputation:

Your regular expression is not working because you are trying to match the word "lidl" exactly as it is (in lowercase).

You should either change the first character of the word to uppercase:

re.compile(r"\bLidl\b")

or use the re.IGNORECASE flag in order to match the word regardless its case:

re.compile(r"\blidl\b", re.IGNORECASE)

Keep in mind that \b tries to match the word in the beginning of the text. For example, "_Lidl" wouldn't match any of the regular expressions above.

Upvotes: 2

Related Questions