Reputation: 1811
I'm trying to come up with a regular expression that will stop at the first occurence of </ol>
. My current RegEx sort of works, but only if </ol>
has spaces on either end. For instance, instead of stopping at the first instance in the line below, it'd stop at the second
some random text <a href = "asdf">and HTML</a></ol></b> bla </ol>
Here's the pattern I'm currently using: string pattern = @"some random text(.|\r|\n)*</ol>";
What am I doing wrong?
Upvotes: 0
Views: 94
Reputation: 92986
Others had already explained the missing ?
to make the quantifier non greedy. I want to suggest also another change.
I don't like your (.|\r|\n)
part. If you have only single characters in your alternation, its simpler to make a character class [.\r\n]
. This is doing the same thing and its better to read (I don't know compiler wise, maybe its also more efficient).
BUT in your special case when the alternatives to the .
are only newline characters, this is also not the correct way. Here you should do this:
Regex A = new Regex(@"some random text.*?</ol>", RegexOptions.Singleline);
Use the Singleline
modifier. It just makes the .
match also newline characters.
Upvotes: 0
Reputation: 34395
This regex matches everything from the beginning of the string up to the first </ol>
. It uses Friedl's "unrolling-the-loop" technique, so is quite efficient:
Regex pattern = new Regex(
@"^[^<]*(?:(?!</ol\b)<[^<]*)*(?=</ol\b)",
RegexOptions.IgnoreCase);
resultString = pattern.Match(text).Value;
Upvotes: 0
Reputation: 28530
While not a Regex, why not simply use the Substring functions, like:
string returnString = someRandomText.Substring(0, someRandomText.IndexOf("</ol>") - 1);
That would seem to be a lot easier than coming up with a Regex to cover all the possible varieties of characters, spaces, etc.
Upvotes: 1
Reputation: 101614
Make your wild-card "ungreedy" by adding a ?
. e.g.
some random text(.|\r|\n)*?</ol>
^- Addition
This will make regex match as few characters as possible, instead of matching as many (standard behavior).
Oh, and regex shouldn't parse [X]HTML
Upvotes: 2
Reputation: 14561
string pattern = @"some random text(.|\r|\n)*?</ol>";
Note the question mark after the star -- that tells it to be non greedy, which basically means that it will capture as little as possible, rather than the greedy as much as possible.
Upvotes: 3