Samul
Samul

Reputation: 2079

Regex with negative lookahead does not work on substring

I came up with the regex below which replaces all numbers followed by % with $1 (the first captured parentheses) except the numbers 0 and 100.

(^|[^0-9])(?!(?:0|100))[0-9]+ *%

For example, the string:

80% 95% 100 % xxx 05% nnn 4 % ppp 32% fff 100 % oo 0% iii

Should become:

100 % xxx nnn ppp fff 100 % oo 0% iii

However it's coming out:

100 % xxx 05% nnn ppp fff 100 % oo 0% iii

For some reason the 05% is not being removed. How do I fix this?

Upvotes: 2

Views: 48

Answers (2)

JvdV
JvdV

Reputation: 75960

Maybe you can use:

\b((100|0)|\d+)\b(\s*%\s*)
  • \b - Word-boundary;
  • ((100|0)|\d+) - Capture group with a nested group to match 100 or 0. The alternative is 1+ digitis;
  • \b - Word-boundary;
  • (\s*%\s*) - A 3rd group to capture spaces and the percent sign.

And replace with:

${2:+$2$3}

The conditional string replacement works for PCRE(2) engines and would only add back capture groups 2 and 3 if 2 is matched. Otherwise it returns an empty string. This will also get rid of the extra spaces that you might want to remove.

See an online demo

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163577

Your pattern does not match 05% because this part (?!(?:0|100)) will match the 0 in 05 and the assertion fails.

What you could do is add a word boundary after the alternation:

(^|[^0-9])(?!(?:0|100)\b)[0-9]+ *%

See the updated regex

Or you might use a leading word boundary with a match only:

\b(?!(?:10)?0\s*%)\d+\s*%

The pattern matches:

  • \b A word boundary to prevent a partial word match
  • (?! Negative lookahead, assert that what is directly to the right of the current position is not
    • (?:10)?0\s*% Match optional 10 followed by a mandatory 0, optional whitespace chars and then a %
  • ) Close the lookahead
  • \d+\s*% Match 1+ digits followed by optional whitespace chars and %

See a regex101 demo

In the replacement you can use an empty string.

Upvotes: 4

Related Questions