iceokoli
iceokoli

Reputation: 35

Filter out all rows in a dataframe containing '**'

I am trying to filter out all rows in a DataFrame that contain the substring '**'.

I have tried doing this with

df = df[~df['title'].str.contains('**')]

However I keep getting an error

error: nothing to repeat at position 0

and can't figure out why.

Upvotes: 0

Views: 103

Answers (2)

sacuL
sacuL

Reputation: 51395

You have to escape the * character using \, as it is being read as the special regex character * (meaning zero or more). In your case:

df[~df['title'].str.contains('\*\*')]

Example:

>>> df
   title
0    xyz
1  x**yz
2     **
3     x*

df[~df['title'].str.contains('\*\*')]

  title
0   xyz
3    x*

Upvotes: 3

miradulo
miradulo

Reputation: 29710

By default str.contains uses re.search, which considers * to be a special character (matching 0 or more characters). You want to call with contains('**', regex=False) to avoid using re.search - and instead use the Python in operator.

Upvotes: 2

Related Questions