celsowm
celsowm

Reputation: 404

How to replace two or more underscore using python?

For machine learning porpuoses, I need to "clean" some text that I am extracting, so I've tried this:

texto = "sdf sdf s _ sfsf sdfs _________ sfsdf"
texto = texto.replace(r"_{2,}"," ")
print(texto)

But the result was not the expected:

sdf sdf s _ sfsf sdfs _________ sfsdf

I would like:

sdf sdf s _ sfsf sdfs  sfsdf

Upvotes: 4

Views: 2740

Answers (2)

Chets
Chets

Reputation: 100

I would simply try to remove 3 consecutive underscores like this:

texto = texto.replace(r"___","")

Upvotes: 0

Jan
Jan

Reputation: 43169

You could use

import re
texto = "sdf sdf s _ sfsf sdfs _________ sfsdf"
rx = re.compile(r'_{2,}')

texto = rx.sub('', texto)

Which yields

sdf sdf s _ sfsf sdfs  sfsdf

If you want to replace the trailing space(s) as well, change the expression to

rx = re.compile(r'_{2,}\s*')

Then the output would be

sdf sdf s _ sfsf sdfs sfsdf
#                   ^^^

Upvotes: 8

Related Questions