user2394303
user2394303

Reputation:

Why does my python regex not work?

I wanna replace all the chars which occur more than one time,I used Python's re.sub and my regex looks like this data=re.sub('(.)\1+','##',data), But nothing happened...
Here is my Text:

Text

※※※※※※※※※※※※※※※※※Chapter One※※※※※※※※※※※※※※※※※※

This is the begining...

Upvotes: 1

Views: 90

Answers (2)

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250891

You need to use raw string here, 1 is interpreted as octal and then its ASCII value present at its integer equivalent is used in the string.

>>> '\1'
'\x01'
>>> chr(01)
'\x01'
>>> '\101'
'A'
>>> chr(0101)
'A'

Use raw string to fix this:

>>> '(.)\1+'
'(.)\x01+'
>>> r'(.)\1+'  #Note the `r`
'(.)\\1+'

Upvotes: 3

user2357112
user2357112

Reputation: 280291

Use a raw string, so the regex engine interprets backslashes instead of the Python parser. Just put an r in front of the string:

data=re.sub(r'(.)\1+', '##', data)
            ^ this r is the important bit

Otherwise, \1 is interpreted as character value 1 instead of a backreference.

Upvotes: 1

Related Questions