Preston
Preston

Reputation: 8197

Remove spaces in group of numbers just after a string

I'm trying to remove spaces between numbers immediately following a string (po box in this case).

I can acheive this using re.match and some replace logic, but it sure would be prettier if i could do this with a conditional re.sub.

How can i integrate a capture group into a regex sub?

I've tried many different versions of the code immediately below, but to no avail:

my_string = "po box 12 34 5 heisenburg 902 rr 15"
re.sub(r'(?:po box)([0-9 ]+)', r'', my_string)

Expected results:

po box 12 59 76           => po box 125976
po box 56 56 barry 56 87  => po box 5656 barry 56 87
barry box 56 87           => barry box 56 87

I've put this together that has the desired effect, but is not ideal.

my_string = "po box 12 34 5 heisenburg 902 rr 15"
match = re.match(r'po box([0-9 ]+)', my_string)

if match:
    # remove spaces between numbers
    spaceless_numbers = match.group(1).replace(' ', '')

    # get original string positions
    start = match.span(1)[0]
    end = match.span(1)[1]

    # get start and end portions of the original string
    first_part = my_string[:start]
    second_part = my_string[end:]
    
    # concatenate start + spaces removed section + end
    print('{} {} {}'.format(first_part, spaceless_numbers, second_part).strip())

Upvotes: 2

Views: 53

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627607

You can use

import re
my_strings = ["po box 12 34 5 heisenburg 902 rr 15", "po box 12 59 76","po box 56 56 barry 56 87","barry box 56 87"]
p = re.compile(r'\b(po\s+box\s*)(\d+(?:\s+\d+)+)')
for s in my_strings:
  print(s, ' => ', p.sub(lambda x: f"{x.group(1)}{''.join(c for c in x.group(2).split())}", s))

See the Python demo. Output:

po box 12 34 5 heisenburg 902 rr 15  =>  po box 12345 heisenburg 902 rr 15
po box 12 59 76  =>  po box 125976
po box 56 56 barry 56 87  =>  po box 5656 barry 56 87
barry box 56 87  =>  barry box 56 87

The regex is

\b(po\s+box\s*)(\d+(?:\s+\d+)+)

See the regex demo. Details:

  • \b - a word boundary
  • (po\s+box\s*) - Group 1: po, 1+ whitespaces, box, 0+ whitespaces
  • (\d+(?:\s+\d+)+) - Group 2: 1+ digits and one or more occurrences of 1+ whitespaces and 1+ digits

The f"{x.group(1)}{''.join(c for c in x.group(2).split())}" replacement is a concatenation of Group 1 and Group 2 with all whitespace removed.

Upvotes: 2

Related Questions