Reputation: 427
start123start123
start123endstart345end
start567endstart789end
I need to extract a number of sets of data(bold) between all start and end of the above string.
My code:
Regex re = new Regex(start(.*)end, RegexOptions.Singleline);
foreach (Match m in re.Matches(text)) dosomething();
The only extracted text will be 789
The problem is that I dont know the exact number of start and end formatted text needed to be extract. I want my regular expression to be able to ignore the start first two start only but greedy regex is ignoring all the start until the last one.
Can it be stopped after it matches the first end text?
If not, is there an option to matched the text from the back?
Update:
Actually, my original code is using non-greedy regex.
The extracted text will be 123start123\r\nstart123 , 345 , 567 , 789
The newline parameter RegexOptions.Singleline is necessary in my real case, I am simplifying the case here for everyone to understand easily
Update 2:
My expected output is 123 , 345 , 567 , 789
Upvotes: 3
Views: 233
Reputation: 70732
The *
is a greedy operator. Therefore, .*
will match as much as it can and still allow the remainder of the regular expression to match.To get a non-greedy match, use *?
start(.*?)end
If I understand your problem correctly, you can use a Negative Lookahead. ( Explanation )
String s = @"start123start123
start123endstart345end
start567endstart789end";
Regex re = new Regex(@"(?s)start((?:(?!start).)*)end");
foreach (Match m in re.Matches(s))
Console.WriteLine(m.Groups[1].Value);
Output
123
345
567
789
Upvotes: 3
Reputation: 174706
C# code to get only the numbers between the start and end strings,
{
String input = @"start123start123
start123endstart345end
start567endstart789end";
Regex rgx = new Regex(@"(?<=start)\d+(?=end)");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Value);
}
Explanation:
(?<=start)\d+
Lookbehind is used to look just after to the a particular string which matches the pattern. In our case, it looks just after to the string start
.\d+(?=end)
Matches one or more digits which should be followed by the string end
Upvotes: 1
Reputation: 4542
If you need to get only the numbers between start and end excluding the words start & end ofcourse:
Regex reg = new Regex(@"(?<=start)[0-9]*(?=end)");
string test = "start123endstart345end";
var resultings = reg.Matches(test);
It will get {1,2,3} {3,4,5} {5,6,7} {7,8,9} in the string you showed:
start123endstart345end
start567endstart789end
Upvotes: 2