Reputation: 35
I am trying to filter out all rows in a DataFrame that contain the substring '**'
.
I have tried doing this with
df = df[~df['title'].str.contains('**')]
However I keep getting an error
error: nothing to repeat at position 0
and can't figure out why.
Upvotes: 0
Views: 103
Reputation: 51395
You have to escape the *
character using \
, as it is being read as the special regex character *
(meaning zero or more). In your case:
df[~df['title'].str.contains('\*\*')]
Example:
>>> df
title
0 xyz
1 x**yz
2 **
3 x*
df[~df['title'].str.contains('\*\*')]
title
0 xyz
3 x*
Upvotes: 3
Reputation: 29710
By default str.contains
uses re.search
, which considers *
to be a special character (matching 0 or more characters). You want to call with contains('**', regex=False)
to avoid using re.search
- and instead use the Python in
operator.
Upvotes: 2