openshac
openshac

Reputation: 5165

How to make a look behind greedy

I am trying to match the text between two markers/tags:

-- #begin free text

this is the first bit of text I want to match
blah blah blah
this is the end of the matching text

-- #end free text

I have managed to do this with the following .Net Regex

(?s)(?<=-- #begin free text\s*)(?<freeText>(.+?))(?=\s+-- #end free text)

Instead of the match beginning with "this is the..." it is matching the the two preceding carriage returns as well, i.e. "\n\nthis is the ..."

How can I ensure the preceding carriage returns (up to n of them) are not included in the match?

Upvotes: 1

Views: 772

Answers (2)

Alan Moore
Alan Moore

Reputation: 75242

Do you really need lookarounds? This works for me:

Regex r = new Regex(
    @"(?s)-- #begin free text\s+(?<freeText>(.+?))\s+-- #end free text");
text = r.Match(subjectString).Groups["name"].Value;

Lookarounds are invaluable when you need them, but most of the time they just get in your way. This is much less true of .NET regexes with their "anything goes" lookbehinds, but it still applies.

Upvotes: 1

zx81
zx81

Reputation: 41838

Use this:

(?s)(?<=-- #begin free text\s*)\S.*?(?=\s*-- #end free text)

In C#:

var myRegex = new Regex(@"(?s)(?<=-- #begin free text\s*)\S.*?(?=\s*-- #end free text)", RegexOptions.Multiline);
string resultString = myRegex.Match(yourString).Value;
Console.WriteLine(resultString);

The match:

this is the first bit of text I want to match\nblah blah blah\nthis is the end of the matching text

Explanation

  • (?s) activates DOTALL mode, allowing the dot to match across lines
  • The lookbehind (?<=-- #begin free text\s*) matches the starting delimiter and optional spaces
  • \S matches a non-space char (starting the match)
  • .*? lazily matches any chars up to...
  • a position where the lookahead (?=\s*-- #end free text) can assert that what follows is optional white-space chars end the ending delimiter

Upvotes: 1

Related Questions