jhdeval
jhdeval

Reputation: 13

C# String processing a non-delimited string to list

Here is an example of the string in question:

[952,M] [782,M] [782] {2[373,M]} [1470] [352] [235] [234] {3[610]}{3[380]} [128] [127]

I have added the spaces but it really does not help the breakdown. What I want to do is take each "field" in square brackets and add it to a string list. The next issue which I can handle is some fields also have a comma separated portion that I can split after the fact. The real problem lies in the Curly braces. For instance {2[373,M]} The number outside the square brackets is a repetition of the square brackets.

For the life of me I can not figure out a way where I can consistently split the line into a string list.

Quasi code follows:

for(i = 0 to string.length)
{
    if string.substring(i,1) = "]"
        int start1 = i
    elseif string.substring(i,1)="["
        int start1 = i
    elseif string.substring(i,1) = "{"
        int start2 = i
    elseif string.substring(i,1) = "}"
        int end2 = i
}

I thought about using the code idea above to substring out each "field" but the curly braces also contain the square brackets. Any ideas would be greatly appreciated.

Upvotes: 1

Views: 142

Answers (5)

αNerd
αNerd

Reputation: 528

You can use a regex.

Edited: this manages problem with commas and repetititon:

        var regex3 = new Regex(@"(\B\[([a-zA-Z0-9\,]+)\])|(\{(\d+)\[([a-zA-Z0-9\,]+)\]\})");
        var stringOne = "[952,M] [782,M] [782] {2[373,M]} [1470] [352] [235] [234] {3[610]}{3[380]} [128] [127]";
        var matches = regex.Matches(stringOne);

        var listStrings = new List<string>();

        foreach (Match match in matches)
        {
            var repetitor = 1;
            string value = null;
            if (match.Groups[1].Value == string.Empty)
            {
                repetitor = int.Parse(match.Groups[4].Value);
                value = match.Groups[5].Value;
            }

            else
            {
                value = match.Groups[2].Value;
            }

            var values = value.Split(',');
            for (var i = 0; i < repetitor; i++)
            {
                listStrings.AddRange(values);
            }
        }

Upvotes: 0

konkked
konkked

Reputation: 3231

If I understand you correctly, you want to split the characters surrounded by brackets, and when they have curly brackets repeat the content inside the specified number of times.

You can extract all the information you need with a regex, including the number needed to determine the number of times you need to repeat a bracket

var input = @"[952,M] [782,M] [782] {2[373,M]} 
              [1470] [352] [235] [234] {3[610]}{3[380]} [128] [127]";

var pattern = @"((:?\{(\d+)(.*?)\})|(:?\[.*?\]))";

MatchCollection matches = Regex.Matches(input, pattern);

var ls = new List<string>();

foreach(Match match in matches)
{
    // check if the item has curly brackets
    // The captures groups will be different if there were curly brackets

    // If there are brackets than the 4th capture group 
    // will have the value of the square brackets and it's content
    if( match.Groups[4].Success ) 
    {
        var value = match.Groups[4].Value;

        // The "Count" of the items will 
        // be in the third capture group
        var count = int.Parse(match.Groups[3].Value);

        for(int i=0;i<count;i++)
        {
            ls.Add(value);
        } 

    }
    else
    {
        // otherwise we know that square bracket input 
        // is in the first capture group
        ls.Add(match.Groups[1].Value);
    }
}

Here is a working fiddle of the solution: https://dotnetfiddle.net/4rQsDj

Here is the output :

[952,M]
[782,M]
[782]
[373,M]
[373,M]
[1470]
[352]
[235]
[234]
[610]
[610]
[610]
[380]
[380]
[380]
[128]
[127]

If you don't want the brackets can get rid of them by changing the regex pattern to (:?(:?\{(\d+)\[(.*?)\]\})|(:?\[(.*?)\])), and match.Groups[1].Value to match.Groups[6].Value.

Here is the working solution without square brackets: https://dotnetfiddle.net/OQwStf

Upvotes: 1

Slai
Slai

Reputation: 22876

var s = "[952,M] [782,M] [782] {2[373,M]} [1470] [352] [235] [234] {3[610]}{3[380]} [128] [127]";

var s2 = Regex.Replace(s, @"\{(\d+)(\[[^]]+\])\}", m => string.Concat( 
    Enumerable.Repeat(m.Groups[2].Value, int.Parse(m.Groups[1].Value))));

var a = s2.Split("[] ".ToArray(), StringSplitOptions.RemoveEmptyEntries);

// s2 = "[952,M] [782,M] [782] [373,M][373,M] [1470] [352] [235] [234] [610][610][610][380][380][380] [128] [127]"
// a = {"952,M","782,M","782","373,M","373,M","1470","352","235","234","610","610","610","380","380","380","128","127"}

Upvotes: 1

Jonathan Wood
Jonathan Wood

Reputation: 67193

While you might be able to get by on RegEx, it may come up short if your needs grow too complex. So the code below shows the general approach I would take to accomplish this. It's a little quick and dirty but meets your requirements.

In addition, I have a parsing helper class that would make this code easier to write and more robust.

string input = "[952,M] [782,M] [782] {2[373,M]} [1470] [352] [235] [234] {3[610]}{3[380]} [128] [127]";
int pos = 0;

void Main()
{
    while (pos < input.Length)
    {
        SkipWhitespace();
        if (pos < input.Length && input[pos] == '{')
            ParseBrace();
        else if (pos < input.Length && input[pos] == '[')
            ParseBracket();
    }
}

void SkipWhitespace()
{
    while (pos < input.Length && char.IsWhiteSpace(input[pos]))
        pos++;
}

void ParseBrace()
{
    Debug.Assert(pos < input.Length && input[pos] == '{');
    int pos2 = input.IndexOf('[', pos + 1);
    if (pos2 < 0)
        pos2 = input.Length;

    int count = int.Parse(input.Substring(pos + 1, pos2 - pos - 1));
    for (int i = 0; i < count; i++)
    {
        pos = pos2;
        ParseBracket();
    }

    pos2 = input.IndexOf('}', pos2 + 1);
    if (pos2 < 0)
        pos2 = input.Length;

    pos = pos2 + 1;
}

void ParseBracket()
{
    Debug.Assert(pos < input.Length && input[pos] == '[');
    int pos2 = input.IndexOf(']', pos + 1);
    if (pos2 < 0)
        pos2 = input.Length;
    Console.WriteLine(input.Substring(pos + 1, pos2 - pos - 1));
    pos = pos2 + 1;
}

Sample output:

952,M
782,M
782
373,M
373,M
1470
352
235
234
610
610
610
380
380
380
128
127

Upvotes: 1

pquest
pquest

Reputation: 3290

The regex below will handle both situations:

(?:\{([^\[]+)){0,1}\[([^\]]+)\]\}{0,1}

For matches for your case without the curly braces, the first match will be empty. For the second case, the first match will contain your number of repeats. In both cases, the second match will contain the actual data. The link below shows a demo of this working:

Regex Demo

Note, however, that you will have to handle the repetition yourself in the code that makes use of the regex

Upvotes: 1

Related Questions