schiv
schiv

Reputation: 23

Regex not as greedy as expected /^(\d+)[^_]/

Regex: /^(\d+)[^_]/gm
Test String: 12_34

I'd expect this regex not to match on test string, because \d+ is greedy eating the digits 1 and 2 and [^_] fails on _.

But it unexpected matches with only 1 in Group1. Where am I wrong?

I try to find a regular expression that matches the digits in test strings "12" or "12xx" but does not match on "12_xx"

Sample: https://regex101.com/r/0QRTjs/1/
Dialect: In the end I'll use Microsoft System.Text.RegularExpressions.

Upvotes: 1

Views: 347

Answers (2)

dawg
dawg

Reputation: 103884

\d+ has the ability to reduce the number of matches if that results in an overall match. By backtracking then 2 satisfies the match of [^_] and 1 is captured.

See HERE

You can use a negative lookahead at the start of the match:

/^(?!\d+_)(\d+)/

See HERE

Or you can use an atomic group that disallows backtracking:

/^((?>\d+))(?:[^_]|$)/

See HERE

Or use a possessive quantifier of ++ which disallows backtracking:

/^\d++([^_]|$)/

See HERE

The possessive quantifier is likely the fastest...

Upvotes: 0

Tranbi
Tranbi

Reputation: 12711

\d+ will match with one or more digits.
Since you append [^_], it can only be followed by a non _ character.
Therefore \d+ cannot match 12 because it is followed by _.
1 is the first matching group because it is followed by 2 which is not _.

If you want to match lines with digits only there is a very simple expression:

^(\d+)$

Upvotes: 0

Related Questions