dhiraj
dhiraj

Reputation: 125

Remove punctuations except intra-digit punctuations

Is there a way to remove all punctuations in a string, but retaining hyphen and intra-digit punctuations?

Hello! this episode is thirty-five minutes long, 35.26 mins to be precise.

should be:

Hello this episode is thirty-five minutes long 35.26 mins to be precise

Upvotes: 2

Views: 260

Answers (2)

Jan
Jan

Reputation: 43169

A variant can be achieved with the newer regex module:

\w+[-.]+\w+(*SKIP)(*FAIL)|[!,.]+

Broken down:

\w+[-.]+\w+    # 1+ word characters, followed by - or ., another 1+ wc
(*SKIP)(*FAIL) # all of these shall fail
|              # or
[!,.]+         # one of !,. but possibly more

See a demo on regex101.com.


In Python:

import regex as re

string = "Hello! this episode is thirty-five minutes long, 35.26 mins to be precise."

rx = re.compile(r'\w+[-.]+\w+(*SKIP)(*FAIL)|[!,.]+')
string = rx.sub('', string)
print(string)
# Hello this episode is thirty-five minutes long 35.26 mins to be precise

Upvotes: 3

cs95
cs95

Reputation: 402483

You can use re.sub with a positive lookahead:

In [165]: re.sub('\W(?=\s|$)', '', s)
Out[165]: 'Hello this episode is thirty-five minutes long 35.26 mins to be precise'

Details

\W      # any character that is not a letter or digit
(?=     # positive lookahead
\s      # whitespace
|       # regex OR
$       # EOL
)

Upvotes: 5

Related Questions