user1990265
user1990265

Reputation: 5

How can I split the following string into a string array

I want to split the following:

name[]address[I]dob[]nationality[]occupation[]

So my results would be:

name[]
address[I]
dob[]
nationality[]
occupation[]

I have tried using Regex.Split but can't get these results.

Upvotes: 0

Views: 103

Answers (5)

Joey
Joey

Reputation: 354456

You can use Regex.Split with the following regex:

(?<=])(?=[a-z])

which will split between a closing square bracket on the left and a letter on the right. This is done using lookaround assertions. They don't consume any characters of the match so in this constellation they're pretty handy to match between letters.

Basically it means exactly what I wrote: (?<=]) will match a point in the string preceded by a closing bracket, while (?=[a-z]) matches a point in the string (both zero-width, i.e. between characters) where a letter follows. You can tweak that a little if your input data looks different from what you gave us in the question.

You could also simplify it a little, at the expense of readability, by using (?<=])\b. But I would advise against that since \b is tied to \w which is a really ugly thing, usually. It would work roughly the same, but not quite, as \b in this context amounts to (?=[\w]) and \w matches a lot more things, namely decimal digits and an underscore too.

Quick PowerShell test (it uses the same regex implementation since it's .NET underneath):

PS> 'name[]address[I]dob[]nationality[]occupation[]' -split '(?<=])(?=[a-z])'
name[]
address[I]
dob[]
nationality[]
occupation[]

Just for completeness, there is also another option. You can either split the string between the tokens you want to retain, or you could just collect all matches of tokens you want to keep. In the latter case you'll need a pattern that matches what you need, such as

[a-z]+\[[^\]]*]

or what Dennis gave as an answer (I just tend to avoid \w and \b except for quick and dirty hacks or golfing since I maintain that they have no useful application). You can use that with Regex.Matches.

Generally both approaches can work fine, it then depends on whether the split or the match pattern is easier to understand. And for Regex.Matches you'll get Match objects so you don't actually end up with a string[] if you need that, so that'd require .Select(m => m.Value) as well.

In this case I guess neither regex should be left alone without a comment explaining what it does. I can read them just fine, but many developers are a little uneasy around regexes and especially more advanced concepts like lookaround often warrant an explanation.

Upvotes: 4

drbald
drbald

Reputation: 56

string inputString = "name[]address[I]dob[]nationality[]occupation[]";    
var result = Regex.Matches(inputString, @".*?\[I?\]").Cast<Match>().Select(m => m.Groups[0].Value).ToArray();

Upvotes: 0

Tommaso Belluzzo
Tommaso Belluzzo

Reputation: 23675

text.Split(new Char[] { ']' }, StringSplitOptions.RemoveEmptyEntries).Select(s => s + "]").ToArray();

Upvotes: 1

as-cii
as-cii

Reputation: 13019

Regular expression should be fine. You can also consider to catch the opening and the closing square brackets with string.IndexOf, for example:

IEnumerable<string> Results(string input)
{
    int currentIndex = -1;
    while (true)
    {
        currentIndex++;
        int openingBracketIndex = input.IndexOf("[", currentIndex);
        int closingBracketIndex = input.IndexOf("]", currentIndex);

        if (openingBracketIndex == -1 || closingBracketIndex == -1)
            yield break;

        yield return input.Substring(currentIndex, closingBracketIndex - currentIndex + 1);
        currentIndex = closingBracketIndex;     
    }
}

Upvotes: 0

Dennis
Dennis

Reputation: 37770

Use this regex pattern:

\w*\[\w*\]

Upvotes: 0

Related Questions