Mike Christensen
Mike Christensen

Reputation: 91608

Extracting tokens from a string with regular expressions in .NET

I'm curious if this is even possible with Regex. I want to extract tokens from a string similar to:

Select a [COLOR] and a [SIZE].

Ok, easy enough - I can use (\[[A-Z]+\])

However, I want to also extract the text between the tokens. Basically, I want the matched groups for the above to be:

"Select a "
"[COLOR]"
" and a "
"[SIZE]"
"."

What's the best approach for this? If there's a way to do this with RegEx, that would be great. Otherwise, I'm guessing I have to extract the tokens, then manually loop through the MatchCollection and parse out the substrings based on the indexes and lengths of each Match. Please note I need to preserve the order of the strings and tokens. Is there a better algorithm to do this sort of string parsing?

Upvotes: 7

Views: 4605

Answers (2)

AMissico
AMissico

Reputation: 21684

Here is a method without using regular expressions (Regex) that uses String.Split, but you lose the delimiters.

        string s = "Select a [COLOR] and a [SIZE].";

        string[] sParts = s.Split('[', ']');

        foreach (string sPart in sParts)
        {
            Debug.WriteLine(sPart);
        }

        // Select a 
        // COLOR
        //  and a 
        // SIZE
        // .

Upvotes: 0

Kobi
Kobi

Reputation: 138017

Use Regex.Split(s, @"(\[[A-Z]+\])") - it should give you the exact array you're after. Split takes captured groups and converts them to tokens in the result array.

Upvotes: 11

Related Questions