aybe
aybe

Reputation: 16662

Why is a capture group needed for this regex to match?

I'd like, ideally, not having to resort to capturing groups but rather, assert that the string starts/ends with some sequence and directly use the value matched by the regex.

Input:

    map_Ks     ./CarbonFiber_T.tga

Input definition:

Attempt 1: it works but result is in a captured group

(?:^\s+map_K.\s+)([^\x00-\x1F\x7C]+)$

  map_Ks     ./CarbonFiber_T.tga
./CarbonFiber_T.tga

Attempt 2: it works, there are no groups but the match is the entire line (ideal usage)

(?=^\s+map_K.\s+)[^\x00-\x1F\x7C]+$

  map_Ks     ./CarbonFiber_T.tga

Question:

Is this possible at all or am I asking the regex engine too much and simply should use capture groups?

Upvotes: 1

Views: 50

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626853

You need to replace the lookahead with a lookbehind and require the first char of the consumed pattern to be a non-whitespace char.

You can use

(?<=^\s+map_K.\s+)(?=\S)[^\x00-\x1F\x7C]*(?<=\S)(?=\s*$)
(?<=^\s+map_K.\s+)[^\x00-\x1F\x7C\s](?:[^\x00-\x1F\x7C]*[^\x00-\x1F\x7C\s])?(?=\s*$)

See the regex demo (or this regex demo). Details:

  • (?<=^\s+map_K.\s+) - a positive lookbehind that matches a location that is immediately preceded with start of string, one or more whitespaces, map_K, any one char other than LF char, one or more whitespaces
  • (?=\S) - a positive lookahead that requires the next char to be a non-whitespace char
  • [^\x00-\x1F\x7C]+ - one or more chars other than ASCII control chars
  • (?<=\S) - the previous char must be a non-whitespace char
  • (?=\s*$) - a positive lookahead requiring zero or more whitespaces at the end of string immediately on the right.

The [^\x00-\x1F\x7C\s](?:[^\x00-\x1F\x7C]*[^\x00-\x1F\x7C\s])? regex part matches one char that is not a whitespace and not an ASCII control char and then an optional sequence of any zero or more chars other than ASCII control chars and then a single char that is not a whitespace and not an ASCII control char.

Just in case you want to adjust the file path regex part, please refer to What characters are forbidden in Windows and Linux directory names?

Upvotes: 1

Related Questions