sflee
sflee

Reputation: 1719

How to use Regex in C# to extract multiple substrings from a string

I search from the web and I have a partial solution only, so I make this question.

Input:

[A] this is A, and , [B] this is B, and hello , [C] this is C - From Here

I want to have a list:

list[0] == "this is A, and"
list[1] == "this is B, and hello"
list[2] == "this is C"
list[3] == "From Here"

I find that I should have something like this:

Regex pattern = new Regex(@"^\[A\] (.*) , \[B\] (.*) , \[C\] (.*) - (.*)$");
List<string> matches = pattern.Matches(input).OfType<Mathc>().Select(m => m.value).Distinct().ToList();

But it is not working. I would like to ask how to make it works. Thanks.

Upvotes: 0

Views: 1113

Answers (2)

zenitex
zenitex

Reputation: 41

The regex is correct, the only thing that you need to do is to iterate on the match groups. In your case the first group will be the whole sentence, so, you can simply skip the first item.
P.S. and of course don't forget to check if there is at least one match result presented. Also if this function will be executed many times I recommend you to extract regex to the static member of your class (because of performance and memory usages).

private static readonly Regex pattern = new Regex(@"^\[A\] (.*) , \[B\] (.*) , \[C\] (.*) - (.*)$");

The final version of the method (with a pattern as a static member) looks like this.

public static List<string> GetMatches(string input)
{
    var matchResult = pattern.Match(input);
    if (matchResult.Length > 0)
    {
        return matchResult.Groups.Values
            .Skip(1)
            .Select(x => x.Value)
            .ToList();
    }
    
    return new List<string>();
}

Upvotes: 2

Thomas Weller
Thomas Weller

Reputation: 59302

The problem is with a confusion between a match and a group. The regex matches only once, but it has several groups inside. Access the first match with [0], then use .OfType<Group>():

List<string> matches = pattern.Matches(input)[0].Groups.OfType<Group>().Select(m => m.Value).Distinct().ToList()

This will give you 5 results:

LinqPad screenshot

You can get rid of the first one with .Skip(1) or matches.RemoveAt(0);.

Upvotes: 1

Related Questions