Natasha
Natasha

Reputation: 1521

Filter rows from a dataframe

I have got a string stored in a dataframe column

import pandas as pd

df = pd.DataFrame({"ID": 1, "content": "froyay-xcd = (E)-cut-2-froyay-xcd"}, index=[0])
print(df)
idx = df[df['content'].str.contains("froyay-xcd  = (E)-cut-2-froyay-xcd")]
print(idx)

I'm trying to find the index of the row that contains a search string and the following warning occurs

UserWarning: This pattern has match groups. To actually get the groups, use str.extract.
  return func(self, *args, **kwargs)

I'm not sure why an empty dataframe is returned when the search string actually is present in the dataframe columns.

Any suggestions will be highly appreciated. I expect the output to return the row stored in the dataframe.

Upvotes: 2

Views: 48

Answers (2)

Dishin H Goyani
Dishin H Goyani

Reputation: 7693

You can add \ before ( and ) to avoid it and then get index using .index

df.content.str.contains("froyay-xcd = \(E\)-cut-2-froyay-xcd").index
Int64Index([0], dtype='int64')

If you have more regex special character better to use regex=False as @jezrael said.

Upvotes: 1

jezrael
jezrael

Reputation: 862651

You can add regex=False parameter for avoid convert values to regex, here () are special regex characters:

idx = df[df['content'].str.contains("froyay-xcd = (E)-cut-2-froyay-xcd", regex=False)]
print(idx)
   ID                            content
0   1  froyay-xcd = (E)-cut-2-froyay-xcd

Or you can escape regex by:

import re

idx = df[df['content'].str.contains(re.escape("froyay-xcd = (E)-cut-2-froyay-xcd"))]
print(idx)
   ID                            content
0   1  froyay-xcd = (E)-cut-2-froyay-xcd

Upvotes: 1

Related Questions