Mazze
Mazze

Reputation: 443

How to substitute different number of digits behind special character with regex

I have the following kind of string:"§ 9,12,14,15 und 16" or "§ 9,12 und 16".

I want to change the string to:"§ 9, § 12, § 14, § 15 und §16" or "§ 9, § 12 und § 16", respectively.

The number of digits varies and I want a code snipped that is applicable for all length:

text = "§§ 9,12,14,15 und 16"
text = re.sub(r'§* (\d+),(\d+),(\d+),(\d+) und (\d+)', r'§ \1, § \2, § \3, § \4 und § \5', text)

I only manage to match the string if I know the number of digits.

Upvotes: 1

Views: 197

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110675

You can do that by using re.sub with a regular expression and a lambda for replacements.

str = "aaa § 9,12,14,15 und 16 bbb"
rgx = r'(?:,|(?<!§) )(?=\d)'
re.sub(rgx, lambda m: ', § ' if m.group() == ',' else ' §', str)
  #=> "aaa § 9, § 12, § 14, § 15 und §16 bbb"

Regex demo¯\(ツ)Python demo

The regular expression can be broken down as follows.

(?:       # begin a non-capture group
  ,       # match a comma
  |       # or
  (?<!§)  # next character cannot be preceded by '§'
  [ ]     # match a space  
)         # end non-capture group
(?=\d)    # next character must be a digit 

(?<!§) is a negative lookbehind; (?=\d) is a positive lookahead. I've placed the space in a character class ([ ]) merely to make it visible.

Upvotes: 1

tripleee
tripleee

Reputation: 189357

There is no single regex which can do that. What you can do is split your string into parts, and perform a substitution on each.

text = "§§ 9,12,14,15 und 16"
parts = re.search(r'(§*)\s*((?:\d+,?\s*)+)\s*und\s+(\d+)', text)
if parts:
    sections = parts.group(2)
    text = re.sub(r'(\d+)', r'§\1', parts.group(2)) + ' und §' + parts.group(3)

The spacing in your example ends up being a bit irregular but this can be fixed up with some light post-processing.

text = re.sub(r',(?!\s)', ', ', re.sub('\s+', ' ', text))

Upvotes: 1

Related Questions