Reputation: 404
For machine learning porpuoses, I need to "clean" some text that I am extracting, so I've tried this:
texto = "sdf sdf s _ sfsf sdfs _________ sfsdf"
texto = texto.replace(r"_{2,}"," ")
print(texto)
But the result was not the expected:
sdf sdf s _ sfsf sdfs _________ sfsdf
I would like:
sdf sdf s _ sfsf sdfs sfsdf
Upvotes: 4
Views: 2740
Reputation: 100
I would simply try to remove 3 consecutive underscores like this:
texto = texto.replace(r"___","")
Upvotes: 0
Reputation: 43169
You could use
import re
texto = "sdf sdf s _ sfsf sdfs _________ sfsdf"
rx = re.compile(r'_{2,}')
texto = rx.sub('', texto)
Which yields
sdf sdf s _ sfsf sdfs sfsdf
If you want to replace the trailing space(s) as well, change the expression to
rx = re.compile(r'_{2,}\s*')
Then the output would be
sdf sdf s _ sfsf sdfs sfsdf
# ^^^
Upvotes: 8