DPF
DPF

Reputation: 290

RegEx: Match portion of config string (by defined region)

I need to isolate a portion of a string after the key %CONFIG\n. Trailing newlines and other regions that begin with % shall be stripped.

My example string:

Configuration File

Format
<Identifier>: <init> <start> <end> <step>

%CONFIG
Line A: 0 1000 5000 300
Line B: 0 0 200 20



%OPTIONAL_OTHER_KEY

some other definitions

where the only match should be:

Line A: 0 1000 5000 300
Line B: 0 0 200 20

Take everything after and including %OPTIONAL_OTHER_KEY as optional content of the input string, which shall not be included in the match.

I've got already (?<=%CONFIG\n)[\w\W]*(?=%), but it does not strip trailing new-lines...

Upvotes: 1

Views: 57

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626816

When you need to leave out some whitespace from the match, the generic subpattern that comes right before should be used with a lazy quantifier (if other means are not working), and the whitespace subpattern must be used with a greedy quantifier (well, in some languages, you should not mix lazy and greedy quantifiers, as in Tcl, I hope it is not the case here). It is something can be implemented quickly, but might require adjusting if any performance issues occur.

So, you can use

(?<=%CONFIG\n)[\w\W]*?(?=\s*%)
                     ^   ^^^

See regex demo

Here, [\w\W]*? is used with *? lazy quantifier matching zero or more any characters but as few as possible. \s* matching zero or more whitespace characters, as many as possible, is added to the lookahead so that it is not part of the match.

However, if you do not have % after %CONFIG, you need to use an unrolled lazy quantifier version.

(?<=%CONFIG\n)\S*(?:\s+[^\s%]\S*)*

See the demo

Upvotes: 1

Related Questions