Th1sD0t
Th1sD0t

Reputation: 1119

How to tell a RegEx to be greedy on an 'Or' Expression

Text:

[A]I'm an example text [] But I want to be included [[]]
[A]I'm another text without a second part []

Regex:

\[A\][\s\S]*?(?:(?=\[\])|(?=\[\[\]\]))

Using the above regex, it's not possible to capture the second part of the first text.

Demo

Is there a way to tell the regex to be greedy on the 'or'-part? I want to capture the biggest group possible.

Edit 1:

Original Attempt:

Demo

Edit 2:

What I want to achive:

In our company, we're using a webservice to report our workingtime. I want to develop a desktop application to easily keep an eye on the worked time. I successfully downloaded the server's response (with all the data necessary) but unfortunately this date is in a quiet bad state to process it.

Therefor I need to split the whole page into different days. Unfortunately, a single day may have multiple time sets, e.g. 06:05 - 10:33; 10:55 - 13:13. The above posted regular expression splits the days dataset after the first time set (so after 10:33). Therefor I want the regex to handle the Or-part "greedy" (if expression 1 (the larger one) is true, skip the second expression. If expression 1 is false, use the second one).

Upvotes: 0

Views: 69

Answers (2)

Poul Bak
Poul Bak

Reputation: 10930

I have changed your regex (actually simpler) to do what you want:

\[A\].*\[?\[\]\]?

It starts by matching the '[A]', then matches any number of any characters (greedy) and finally one or two '[]'.

Edit:

This will prefer double Square brackets:

\[A\].*(?:\[\[\]\]|\[\])

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626927

You may use

\[A][\s\S]*?(?=\[A]|$)

See the regex demo.

Details

  • \[A] - a [A] substring
  • [\s\S]*? - any 0+ chars as few as possible
  • (?=\[A]|$) - a location that is immediately followed with [A] or end of string.

In C#, you actually may even use a split operation:

Regex.Split(s, @"(?!^)(?=\[A])")

See this .NET regex demo. The (?!^)(?=\[A]) regex matches a location in a string that is not at the start and that is immediately followed with [A].

If instead of A there can be any letter, replaces A with [A-Z] or [A-Z]+.

enter image description here

Upvotes: 1

Related Questions