Franc Weser
Franc Weser

Reputation: 869

Regex match on multiple overlapping occurrences?

I have strings that look like:

sometext 3x 24x5 x 17.5 x 3 sometext

And I would like to concatenate all instances of digit + optional space + x + optional space + digit into digit + x + digit. Desired output:

sometext 3x24x5x17.5x3 sometext

My current Regex seems fine, but somehow it doesn't work:

re.sub(r'(\d)\s?([x])\s?(\d)', r'\1\2\3', 'sometext 3x 24x5 x 17.5 x 3 sometext')

Yields

sometext 3x24x5 x 17.5x3 sometext

It seems this has to do with the fact that the 24x5 is already captured by the expression, so it doesn't consider 5 x 17. My question would be, how to adjust my regex for the desired purpose, and, is there any more clean/efficient way to write that regex than my approach? Thanks!

Upvotes: 1

Views: 62

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

I suggest two options:

import re
s = 'sometext 3x 24x5 x 17.5 x 3 sometext'
print (re.sub(r'(?<=\d)\s+(?=x)|(?<=x)\s+(?=\d)', '', s))
print (re.sub(r'(?<=\d)\s+(?=x\s*\d)|(\d)\s*(x)\s+(?=\d)', r'\1\2', s))

See the Python demo. Both return sometext 3x24x5x17.5x3 sometext, but the second seems to be more precise.

Regex #1 details

  • (?<=\d)\s+(?=x) - one or more whitespaces between a digit and x
  • | - or
  • (?<=x)\s+(?=\d) - one or more whitespaces between an x and a digit

Regex #2 details

  • (?<=\d)\s+(?=x\s*\d) - one or more whitespaces between a digit and x + zero or more whitespaces and a digit
  • | - or
  • (\d)\s*(x)\s+(?=\d) - matches a digit (captured into Group 1), then one or more whitespaces, then x (captured in Group 2) and then \s+ matches 1 or more whitespaces followed with a digit.

The replacement is the concatenation of Group 1 and 2 values.

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521997

You could use re.sub to identify all number-x terms, then use a callback to strip all whitespace from each match:

inp = "sometext 3x 24x5 x 17.5 x 3 sometext 1 x 2.3 x 4"
output = re.sub(r'\d+(?:\.\d+)?(?:\s*x\s*\d+(?:\.\d+)?)+', lambda m: re.sub(r'\s', '', m.group(0)), inp)
print(output)

This prints:

sometext 3x24x5x17.5x3 sometext 1x2.3x4

Upvotes: 2

Related Questions