Jens de Bruijn
Jens de Bruijn

Reputation: 969

Regex: match consecutive punctuation marks and replace by the first

I am trying to remove some predefined consecutive punctuation marks and replace them with the first. Thus:

  1. u.s., -> u.s.
  2. u.s. -> u.s.
  3. u.s.! -> u.s.
  4. hiiii!!!, -> hiiii!

I tried the following code:

import re
r = re.compile(r'([.,/#!$%^&*;:{}=-_`~()])*\1')
n = r.sub(r'\1', "ews by almalki : Tornado, flood deaths reach 18 in U.s., more storms ahead ")
print(n)

Upvotes: 3

Views: 1516

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626903

You just need to capture the first punctuation mark and match the rest:

([.,/#!$%^&*;:{}=_`~()-])[.,/#!$%^&*;:{}=_`~()-]+

See the regex demo

Note that the - must be put at the end (or start) of the character class in order not to create a range (or it can be escaped inside the character class).

Details:

  • ([.,/#!$%^&*;:{}=_`~()-]) - capturing group with the punctuation symbols you defined
  • [.,/#!$%^&*;:{}=_`~()-]+ - 1+ punctuation symbols

Python demo:

import re
r = re.compile(r'([.,/#!$%^&*;:{}=_`~()-])[.,/#!$%^&*;:{}=_`~()-]+')
n = r.sub(r'\1', "ews by almalki : Tornado, flood deaths reach 18 in U.s., more storms ahead ")
print(n)

Upvotes: 6

Related Questions