Isolet Chan
Isolet Chan

Reputation: 427

Regular expression matching from the end

start123start123

start123endstart345end

start567endstart789end

I need to extract a number of sets of data(bold) between all start and end of the above string.

My code:

Regex re = new Regex(start(.*)end, RegexOptions.Singleline);
foreach (Match m in re.Matches(text)) dosomething();

The only extracted text will be 789

The problem is that I dont know the exact number of start and end formatted text needed to be extract. I want my regular expression to be able to ignore the start first two start only but greedy regex is ignoring all the start until the last one.

Can it be stopped after it matches the first end text?

If not, is there an option to matched the text from the back?

Update:

Actually, my original code is using non-greedy regex.

The extracted text will be 123start123\r\nstart123 , 345 , 567 , 789

The newline parameter RegexOptions.Singleline is necessary in my real case, I am simplifying the case here for everyone to understand easily

Update 2:

My expected output is 123 , 345 , 567 , 789

Upvotes: 3

Views: 233

Answers (3)

hwnd
hwnd

Reputation: 70732

The * is a greedy operator. Therefore, .* will match as much as it can and still allow the remainder of the regular expression to match.To get a non-greedy match, use *?

start(.*?)end

Edit

If I understand your problem correctly, you can use a Negative Lookahead. ( Explanation )

String s = @"start123start123
start123endstart345end
start567endstart789end";

Regex re = new Regex(@"(?s)start((?:(?!start).)*)end");

foreach (Match m in re.Matches(s))
         Console.WriteLine(m.Groups[1].Value);

Output

123
345
567
789

Upvotes: 3

Avinash Raj
Avinash Raj

Reputation: 174706

C# code to get only the numbers between the start and end strings,

{
String input = @"start123start123
start123endstart345end
start567endstart789end";
Regex rgx = new Regex(@"(?<=start)\d+(?=end)");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Value);
}

DEMO

IDEONE

Explanation:

  • (?<=start)\d+ Lookbehind is used to look just after to the a particular string which matches the pattern. In our case, it looks just after to the string start.
  • \d+(?=end) Matches one or more digits which should be followed by the string end

Upvotes: 1

terrybozzio
terrybozzio

Reputation: 4542

If you need to get only the numbers between start and end excluding the words start & end ofcourse:

Regex reg = new Regex(@"(?<=start)[0-9]*(?=end)");
string test = "start123endstart345end";
var resultings = reg.Matches(test);

It will get {1,2,3} {3,4,5} {5,6,7} {7,8,9} in the string you showed:

start123endstart345end

start567endstart789end

Upvotes: 2

Related Questions