Reputation: 125
Is there a way to remove all punctuations in a string, but retaining hyphen and intra-digit punctuations?
Hello! this episode is thirty-five minutes long, 35.26 mins to be precise.
should be:
Hello this episode is thirty-five minutes long 35.26 mins to be precise
Upvotes: 2
Views: 260
Reputation: 43169
A variant can be achieved with the newer regex
module:
\w+[-.]+\w+(*SKIP)(*FAIL)|[!,.]+
Broken down:
\w+[-.]+\w+ # 1+ word characters, followed by - or ., another 1+ wc
(*SKIP)(*FAIL) # all of these shall fail
| # or
[!,.]+ # one of !,. but possibly more
Python
:
import regex as re
string = "Hello! this episode is thirty-five minutes long, 35.26 mins to be precise."
rx = re.compile(r'\w+[-.]+\w+(*SKIP)(*FAIL)|[!,.]+')
string = rx.sub('', string)
print(string)
# Hello this episode is thirty-five minutes long 35.26 mins to be precise
Upvotes: 3
Reputation: 402483
You can use re.sub
with a positive lookahead:
In [165]: re.sub('\W(?=\s|$)', '', s)
Out[165]: 'Hello this episode is thirty-five minutes long 35.26 mins to be precise'
Details
\W # any character that is not a letter or digit
(?= # positive lookahead
\s # whitespace
| # regex OR
$ # EOL
)
Upvotes: 5