Bryan
Bryan

Reputation: 37

C# Use Regex to split on Words

This is a stripped down version of code I am working on. The purpose of the code is to take a string of information, break it down, and parse it into key value pairs.

Using the info in the example below, a string might look like:

"DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567"

One further point about the above example, at least three of the features we have to parse out will occasionally include additional values. Here is an updated fake example string.

"DIVIDE = KE48, KE49, KE50 CLACOS = 4566D DIV = 3466 INT = 4567 & 4568"

The problem with this is that the code refuses to split out DIVIDE and DIV information separately. Instead, it keeps splitting at DIV and then assigning the rest of the information as the value.

Is there a way to tell my code that DIVIDE and DIV need to be parsed out as two separate values, and to not turn DIVIDE into DIV?

public List<string> FeatureFilterStrings
    {
        // All possible feature types from the EWSD switch.  
        get
        {
            return new List<string>() { "DIVIDE", "DIV", "CLACOS", "INT"};
        }
    }

public void Parse(string input){

    Func<string, bool> queryFilter = delegate(string line) { return FeatureFilterStrings.Any(s => line.Contains(s)); };


    Regex regex = new Regex(@"(?=\\bDIVIDE|DIV|CLACOS|INT)");
    string[] ms = regex.Split(updatedInput);
    List<string> queryLines = new List<string>();
    // takes the parsed out data and assigns it to the queryLines List<string>
    foreach (string m in ms)
    {
        queryLines.Add(m);
    }

    var features = queryLines.Where(queryFilter);
    foreach (string feature in features)
        {
            foreach (Match m in Regex.Matches(workLine, valueExpression))
            {
                string key = m.Groups["key"].Value.Trim();
                string value = String.Empty;

                value = Regex.Replace(m.Groups["value"].Value.Trim(), @"s", String.Empty);
                AddKeyValue(key, value);
            }
        }

    private void AddKeyValue(string key, string value)
    {
        try
        {
            // Check if key already exists. If it does, remove the key and add the new key with updated value.
            // Value information appends to what is already there so no data is lost.
            if (this.ContainsKey(key))
            {
                this.Remove(key);
                this.Add(key, value.Split('&'));
            }
            else
            {
                this.Add(key, value.Split('&'));
            }
        }
        catch (ArgumentException)
        {
            // Already added to the dictionary.
        }
    }       
}

Further information, the string information does not have a set number of spaces between each key/value, each string may not include all of the values, and the features aren't always in the same order. Welcome to parsing old telephone switch information.

Upvotes: 0

Views: 113

Answers (2)

Enigmativity
Enigmativity

Reputation: 117057

This might be a simple alternative for you.

Try this code:

var input = "DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567";

var parts = input.Split(new [] { '=', ' ' }, StringSplitOptions.RemoveEmptyEntries);

var dictionary =
    parts.Select((x, n) => new { x, n })
         .GroupBy(xn => xn.n / 2, xn => xn.x)
         .Select(xs => xs.ToArray())
         .ToDictionary(xs => xs[0], xs => xs[1]);

I then get the following dictionary:

dictionary


Based on your updated input, things get more complicated, but this works:

var input = "DIVIDE = KE48, KE49, KE50 CLACOS = 4566D DIV = 3466 INT = 4567 & 4568";

Func<string, char, string> tighten =
    (i, c) => String.Join(c.ToString(), i.Split(c).Select(x => x.Trim()));

var parts =
    tighten(tighten(input, '&'), ',')
    .Split(new[] { '=', ' ' }, StringSplitOptions.RemoveEmptyEntries);

var dictionary =
    parts
        .Select((x, n) => new { x, n })
        .GroupBy(xn => xn.n / 2, xn => xn.x)
        .Select(xs => xs.ToArray())
        .ToDictionary(
            xs => xs[0],
            xs => xs
                .Skip(1)
                .SelectMany(x => x.Split(','))
                .SelectMany(x => x.Split('&'))
                .ToArray());

I get this dictionary:

dictionary2

Upvotes: 1

Eser
Eser

Reputation: 12546

I would create a dictionary from your input string

string input = "DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567";

var dict = Regex.Matches(input, @"(\w+?) = (.+?)( |$)").Cast<Match>()
           .ToDictionary(m => m.Groups[1].Value, m => m.Groups[2].Value);

Test the code:

foreach(var kv in dict)
{
    Console.WriteLine(kv.Key + "=" + kv.Value);
}

Upvotes: 2

Related Questions