Edward Tanguay
Edward Tanguay

Reputation: 193362

How can I change this regular expression so grab the text before the FIRST colon and ignore the rest?

What do I have to change in this regular expression so that in both cases below it gets the text before the first colon as the "label" and all the rest of the text as the "text".

using System;
using System.Text.RegularExpressions;

namespace TestRegex92343
{
    class Program
    {
        static void Main(string[] args)
        {
            {
                //THIS WORKS:
                string line = "title: The Way We Were";
                Regex regex = new Regex(@"(?<label>.+):\s*(?<text>.+)");
                Match match = regex.Match(line);
                Console.WriteLine("LABEL IS: {0}", match.Groups["label"]); //"title"
                Console.WriteLine("TEXT IS: {0}", match.Groups["text"]); //"The Way We Were"
            }

            {
                //THIS DOES NOT WORK:
                string line = "title: The Way We Were: A Study of Youth";
                Regex regex = new Regex(@"(?<label>.+):\s*(?<text>.+)");
                Match match = regex.Match(line);

                Console.WriteLine("LABEL IS: {0}", match.Groups["label"]);
                //GETS "title: The Way We Were"
                //SHOULD GET: "title"

                Console.WriteLine("TEXT IS: {0}", match.Groups["text"]); 
                //GETS: "A Study of Youth"
                //SHOULD GET: "The Way We Were: A Study of Youth"
            }

            Console.ReadLine();

        }
    }
}

Upvotes: 0

Views: 726

Answers (2)

Jason McCreary
Jason McCreary

Reputation: 73021

Regular expression are greedy, and the . matches anything. That's why label is getting the whole string. If your titles are always just text, I would recommend the following:

(?<label>\w+):\s*(?<text>.+)

Otherwise, you could make the expression not greedy with:

(?<label>.+?):\s*(?<text>.+)

You want to avoid the greedy operators whenever possible and always try to match specifically what you want.

Upvotes: 2

Matthew Flaschen
Matthew Flaschen

Reputation: 284927

new Regex(@"(?<label>[^:]+):\s*(?<text>.+)");

This simply replaces the dot with a [^:] character class. This means any character except colon.

Upvotes: 3

Related Questions