Preston
Preston

Reputation: 8187

Replace all hyphens except in between two digits

Following the question here I'm trying to replace a hyphen if it does not appear in a US postal code.

The logic is:

I've tried to acheive this using:

import re
p = re.compile(r'(?!\d+\-\d+)-') # regex here
test_str = "12345-4567 hello-you"
re.sub(p, " ", test_str)

What am I doing wrong?

Upvotes: 2

Views: 225

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626903

You may use

import re
p = re.compile(r'(?!(?<=\d)-\d)-')
test_str = "12345-4567 hello-you 45-year N-45"
print(re.sub(p, " ", test_str))
# => 12345-4567 hello you 45 year N 45

See the Python demo and the regex demo.

The (?!(?<=\d)-\d)- regex matches a

  • (?!(?<=\d)-\d) - a location in a string that is not immediately followed with a - (that is immediately preceded with a digit) followed with a digit
  • - - a hyphen.

Another approach is to match and capture postal code like strings to keep them and replace - in all other contexts:

re.sub(r'\b(\d{5}-\d{4})\b|-', r'\1 ', text)

See the regex demo and the Python demo.

Note \b(\d{5}-\d{4})\b matches and captures into Group 1 a word boundary position first, then matches any five digits, a hyphen, four digits and again a word boundary. The \1 backreference in the replacement pattern refers to the value captured in Group 1.

Upvotes: 3

Related Questions