MehmedB
MehmedB

Reputation: 1147

Why positive lookahead is working but negative lookahead doesn't?

First of all, regex needs to be working for both the python and PCRE(PHP). I'm trying to ignore if a regex pattern is followed by the letter 'x' to distinguish dimensions from strings like "number/number" in the given example below:

dummy word 222/2334; Ø14 x Ø6,33/523,23 x 2311 mm

From here, I'm trying to extract 222/2334 but not the 6,33/523,23 since that part is actually part of dimensions. So far I came up with this regex

((\d*(?:,?\.?)\d*(?:,?\.?))\s?\/\s?(\d*(?:,?\.?)\d*(?:,?\.?)))(?=\s?x)

which can extract what I don't want it to extract and it looks like this. If I change the positive lookahead to negative it captures both of them except the last '3' from 6,33/523,23. It looks like this. How can I only capture 222/2334? What am I doing wrong here?

Desired output:

222/2334

What I got

222/2334 6,33/523,2

Upvotes: 1

Views: 113

Answers (2)

anubhava
anubhava

Reputation: 785761

You may use this simplified regex with negative lookahead:

((\d*(?:,?\.?)\d*(?:,?\.?))\s?\/\s?(\d*(?:,?\.?)\d*(?:,?\.?)))\b(?![.,]?\d|\s?x)

Updated RegEx Demo

  • It is important to use a word boundary in the end to avoid matching partial numbers (the reason of your regex matching till a digit before)

  • Also include [.,]?\d in negative lookahead condition so that match doesn't end at position before last comma.


This shorter (and more efficient) regex may also work for OP:

(\d+(?:[,.]\d+)*)\s*\/\s*(\d+(?:[,.]\d+)*)\b(?![.,]?\d|\s?x)

RegEx Demo 2

Upvotes: 1

ctwheels
ctwheels

Reputation: 22837

There are two easy options.

The first option is ugly and long, but basically negates a positive match on the string that is followed by x, then matches the patterns without it.

(?!PATTERN(?=x))PATTERN

See regex in use here

(?!\d+(?:[,.]\d+)?\s?\/\s?\d+(?:[,.]\d+)?(?=\s?x))(\d+(?:[,.]\d+)?)\s?\/\s?(\d+(?:[,.]\d+)?)

The second option uses possessive quantifiers, but you'll have to use the regex module instead of re in python.

See regex in use here

(\d+(?:[,.]\d+)?+)\s?\/\s?(\d+(?:[,.]\d+)?+)(?!\s?x)

Additionally, I changed your subpattern to \d+(?:[,.]\d+)?. This will match one or more digits, then optionally match . or , followed by one or more digits.

Upvotes: 1

Related Questions