Al2110
Al2110

Reputation: 576

Splitting a string with a space/two spaces after the character

Consider a number of strings, which are assumed to contain "keys" of the form "Wxxx", where x are digits from 0-9. Each one can contain either one only, or multiple ones, separated by ',' followed by two spaces. For example:

W123
W432
W546,  W234,  W167

The ones that contain multiple "keys" need to be split up, into an array. So, the last one in the above examples should be split into an array like this: {"W546", "W234", "W167"}.

As a quick solution, String.Split comes to mind, but as far as I am aware, it can take one character, like ','. The problem is that it would return an array with like this: {"W546", " W234", " W167"}. The two spaces in all the array entries from the second one onwards can probably be removed using Substring, but is there a better solution?

For context, these values are being held in a spreadsheet, and are assumed to have undergone data validation to ensure the "keys" are separated by a comma followed by two spaces.

while ((ws.Cells[row,1].Value!=null) && (ws.Cells[row,1].Value.ToString().Equals("")))
{
    // there can be one key, or multiple keys separated by ','
    if (ws.Cells[row,keysCol].Value.ToString().Contains(','))
    {
        // there are multiple
        // need to split the ones in this cell separated by a comma           
    }
    else
    {
        // there is one
    }

    row++;
}

Upvotes: 1

Views: 300

Answers (4)

Barns
Barns

Reputation: 4848

You could use an old favorite--Regular Expressions.

Here are two flavors 'Loop' or 'LINQ'.

    static void Main(string[] args)
    {
        var list = new List<string>{"W848","W998, W748","W953, W9484, W7373","W888"};

        Console.WriteLine("LINQ");
        list.ForEach(l => TestSplitRegexLinq(l));

        Console.WriteLine();
        Console.WriteLine("Loop");
        list.ForEach(l => TestSplitRegexLoop(l));
    }


    private static void TestSplitRegexLinq(string s)
    {
        string pattern = @"[W][0-9]*";                
        var reg = new Regex(pattern);
        reg.Matches(s).ToList().ForEach(m => Console.WriteLine(m.Value));
    }



    private static void TestSplitRegexLoop(string s)
    {
        string pattern = @"[W][0-9]*";                
        var reg = new Regex(pattern);
        foreach (Match m in reg.Matches(s))
        {
            Console.WriteLine(m.Value);
        }
    }

Just replace the Console.Write with anything you want: eg. myList.Add(m.Value).

You will need to add the NameSpace: using System.Text.RegularExpressions;

Upvotes: 2

Filburt
Filburt

Reputation: 18061

You can just specify ',' and ' ' as separators and RemoveEmptyEntries.

Using your sample of single keys and a string containing multiple keys you can just handle them all the same and get your list of individual keys:

List<string> cells = new List<string>() { "W123", "W432", "W546,  W234,  W167" };
List<string> keys = new List<string>();

foreach (string cell in cells)
{
    keys.AddRange(cell.Split(new char[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries));
}

Split can handle strings where's nothing to split and AddRange will accept your single keys as well as the multi-key split results.

Upvotes: 4

Al2110
Al2110

Reputation: 576

After trying this in .NET fiddle, I think I may have a solution:

// if there are multiple
string keys = ws.Cells[row,keysCol].Value.ToString();

// remove spaces
string keys_normalised = keys.Replace(" ", string.Empty);
Console.WriteLine("Checking that spaces have been removed: " + keys3_normalised + "\n");

string[] splits = keys3_normalised.Split(',');
for (int i = 0; i < splits.Length; i++)
{
    Console.WriteLine(splits[i]);
}

This produces the following output in the console:

Checking that spaces have been removed: W456,W234,W167

W456
W234
W167

Upvotes: 0

John Wu
John Wu

Reputation: 52240

Eliminate the extra space first (using Replace()), then use split.

var input = "W546, W234, W167";
var normalized = input.Replace(", ",",");  
var array = normalized.Split(',');

This way, you treat a comma followed by a space exactly the same as you'd treat a comma. If there might be two spaces you can also replace that:

var input = "W546,  W234, W167";
var normalized = input.Replace("  "," ").Replace(", ",",");  
var array = normalized.Split(',');

Upvotes: 2

Related Questions