Reputation: 1147
First of all, regex needs to be working for both the python and PCRE(PHP). I'm trying to ignore if a regex pattern is followed by the letter 'x' to distinguish dimensions from strings like "number/number" in the given example below:
dummy word 222/2334; Ø14 x Ø6,33/523,23 x 2311 mm
From here, I'm trying to extract 222/2334
but not the 6,33/523,23
since that part is actually part of dimensions. So far I came up with this regex
((\d*(?:,?\.?)\d*(?:,?\.?))\s?\/\s?(\d*(?:,?\.?)\d*(?:,?\.?)))(?=\s?x)
which can extract what I don't want it to extract and it looks like this. If I change the positive lookahead to negative it captures both of them except the last '3' from 6,33/523,23
. It looks like this. How can I only capture 222/2334
? What am I doing wrong here?
Desired output:
222/2334
What I got
222/2334 6,33/523,2
Upvotes: 1
Views: 113
Reputation: 785761
You may use this simplified regex with negative lookahead:
((\d*(?:,?\.?)\d*(?:,?\.?))\s?\/\s?(\d*(?:,?\.?)\d*(?:,?\.?)))\b(?![.,]?\d|\s?x)
It is important to use a word boundary in the end to avoid matching partial numbers (the reason of your regex matching till a digit before)
Also include [.,]?\d
in negative lookahead condition so that match doesn't end at position before last comma.
This shorter (and more efficient) regex may also work for OP:
(\d+(?:[,.]\d+)*)\s*\/\s*(\d+(?:[,.]\d+)*)\b(?![.,]?\d|\s?x)
Upvotes: 1
Reputation: 22837
There are two easy options.
The first option is ugly and long, but basically negates a positive match on the string that is followed by x
, then matches the patterns without it.
(?!PATTERN(?=x))PATTERN
(?!\d+(?:[,.]\d+)?\s?\/\s?\d+(?:[,.]\d+)?(?=\s?x))(\d+(?:[,.]\d+)?)\s?\/\s?(\d+(?:[,.]\d+)?)
The second option uses possessive quantifiers, but you'll have to use the regex
module instead of re
in python.
(\d+(?:[,.]\d+)?+)\s?\/\s?(\d+(?:[,.]\d+)?+)(?!\s?x)
Additionally, I changed your subpattern to \d+(?:[,.]\d+)?
. This will match one or more digits, then optionally match .
or ,
followed by one or more digits.
Upvotes: 1