Reputation: 8197
I'm trying to remove spaces between numbers immediately following a string (po box
in this case).
I can acheive this using re.match
and some replace logic, but it sure would be prettier if i could do this with a conditional re.sub
.
How can i integrate a capture group into a regex sub?
I've tried many different versions of the code immediately below, but to no avail:
my_string = "po box 12 34 5 heisenburg 902 rr 15"
re.sub(r'(?:po box)([0-9 ]+)', r'', my_string)
Expected results:
po box 12 59 76 => po box 125976
po box 56 56 barry 56 87 => po box 5656 barry 56 87
barry box 56 87 => barry box 56 87
I've put this together that has the desired effect, but is not ideal.
my_string = "po box 12 34 5 heisenburg 902 rr 15"
match = re.match(r'po box([0-9 ]+)', my_string)
if match:
# remove spaces between numbers
spaceless_numbers = match.group(1).replace(' ', '')
# get original string positions
start = match.span(1)[0]
end = match.span(1)[1]
# get start and end portions of the original string
first_part = my_string[:start]
second_part = my_string[end:]
# concatenate start + spaces removed section + end
print('{} {} {}'.format(first_part, spaceless_numbers, second_part).strip())
Upvotes: 2
Views: 53
Reputation: 627607
You can use
import re
my_strings = ["po box 12 34 5 heisenburg 902 rr 15", "po box 12 59 76","po box 56 56 barry 56 87","barry box 56 87"]
p = re.compile(r'\b(po\s+box\s*)(\d+(?:\s+\d+)+)')
for s in my_strings:
print(s, ' => ', p.sub(lambda x: f"{x.group(1)}{''.join(c for c in x.group(2).split())}", s))
See the Python demo. Output:
po box 12 34 5 heisenburg 902 rr 15 => po box 12345 heisenburg 902 rr 15
po box 12 59 76 => po box 125976
po box 56 56 barry 56 87 => po box 5656 barry 56 87
barry box 56 87 => barry box 56 87
The regex is
\b(po\s+box\s*)(\d+(?:\s+\d+)+)
See the regex demo. Details:
\b
- a word boundary(po\s+box\s*)
- Group 1: po
, 1+ whitespaces, box
, 0+ whitespaces(\d+(?:\s+\d+)+)
- Group 2: 1+ digits and one or more occurrences of 1+ whitespaces and 1+ digitsThe f"{x.group(1)}{''.join(c for c in x.group(2).split())}"
replacement is a concatenation of Group 1 and Group 2 with all whitespace removed.
Upvotes: 2