Reputation: 193362
What do I have to change in this regular expression so that in both cases below it gets the text before the first colon as the "label" and all the rest of the text as the "text".
using System;
using System.Text.RegularExpressions;
namespace TestRegex92343
{
class Program
{
static void Main(string[] args)
{
{
//THIS WORKS:
string line = "title: The Way We Were";
Regex regex = new Regex(@"(?<label>.+):\s*(?<text>.+)");
Match match = regex.Match(line);
Console.WriteLine("LABEL IS: {0}", match.Groups["label"]); //"title"
Console.WriteLine("TEXT IS: {0}", match.Groups["text"]); //"The Way We Were"
}
{
//THIS DOES NOT WORK:
string line = "title: The Way We Were: A Study of Youth";
Regex regex = new Regex(@"(?<label>.+):\s*(?<text>.+)");
Match match = regex.Match(line);
Console.WriteLine("LABEL IS: {0}", match.Groups["label"]);
//GETS "title: The Way We Were"
//SHOULD GET: "title"
Console.WriteLine("TEXT IS: {0}", match.Groups["text"]);
//GETS: "A Study of Youth"
//SHOULD GET: "The Way We Were: A Study of Youth"
}
Console.ReadLine();
}
}
}
Upvotes: 0
Views: 726
Reputation: 73021
Regular expression are greedy, and the .
matches anything. That's why label is getting the whole string. If your titles are always just text, I would recommend the following:
(?<label>\w+):\s*(?<text>.+)
Otherwise, you could make the expression not greedy with:
(?<label>.+?):\s*(?<text>.+)
You want to avoid the greedy operators whenever possible and always try to match specifically what you want.
Upvotes: 2
Reputation: 284927
new Regex(@"(?<label>[^:]+):\s*(?<text>.+)");
This simply replaces the dot with a [^:]
character class. This means any character except colon.
Upvotes: 3