BBousman
BBousman

Reputation: 103

Regular Expression for parsing ASCII data

Right now I have a couple separate regular expressions to filter data from a string but I'm curious if there's a way to do it all in one go.

Sample Data: (DATA$0$34.0002,5.3114$34.0002,5.2925$34.0004,5.3214$34.0007,2.2527$34.0002,44.3604$34.0002,43.689$34.0004,38.3179$34.0007,8.1299)

  1. Need to verify there's an open and close parentheses ( )
  2. Need to verify there's a "DATA$0" after the open parenthesis
  3. Need to split the results by $
  4. Need to split that subset by comma
  5. Need to capture only the last item of that subset (i.e. 5.3114, 5.2925, 5.3214, etc.)

My first check is on parenthesis using (([^)]+)) as my RegEx w/ RightToLeft & ExplicitCapture options (some lines can have multiple data sets).

Next I filter for the DATA$0 using (?:(DATA$0)

Finally I do my splits and take the last value in the array to get what I need but I'm trying to figure out if there's a better way.

string DataPattern = @"(?:\(DATA\$0)";
string ParenthesisPattern = @"\(([^)]+)\)";
RegexOptions options = RegexOptions.RightToLeft | RegexOptions.ExplicitCapture;

StreamReader sr = new StreamReader(FilePath);
while (!sr.EndOfStream)
{
    string line = sr.ReadLine();
    Console.WriteLine(line);

    Match parentMatch = Regex.Match(line, ParenthesisPattern, options);
    if (parentMatch.Success)
    {
        string value = parentMatch.Value;

        Match dataMatch = Regex.Match(value, DataPattern);
        if (dataMatch.Success)
        {
            string output = parentMatch.Value.Replace("(DATA$0", "").Replace(")", "");
            string[] splitOutput = Regex.Split(output, @"\$");

            foreach (string x in splitOutput)
            {
                if (!string.IsNullOrEmpty(x))
                {
                    string[] splitDollar = Regex.Split(x, ",");
                    if (splitDollar.Length > 0)
                        Console.WriteLine("Value: " + splitDollar[splitDollar.Length - 1]);
                }
            }
        }
        else
            Console.WriteLine("NO DATA");
    }
    else
        Console.WriteLine("NO PARENTHESIS");

    Console.ReadLine();
}

TIA

Upvotes: 2

Views: 129

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627119

You can use

var results = Regex.Matches(text, @"(?<=\(DATA\$0[^()]*,)[^(),$]+(?=(?:\$[^()]*)?\))")
    .Cast<Match>()
    .Select(x => x.Value)
    .ToList();

See the regex demo. Details:

  • (?<=\(DATA\$0[^()]*,) - a positive lookbehind that matches a location that is immediately preceded with (DATA$0, zero or more chars other than ( and ) (as many as possible) and a comma
  • [^(),$]+ - one or more chars other than (, ), $ and a comma
  • (?=(?:\$[^()]*)?\)) - the current location must be immediately followed with an optional occurrence of a $ char and then zero or more chars other than ( and ), and then a ) char.

An alternative:

var results = Regex.Matches(text, @"(?:\G(?!^)|\(DATA\$0)[^()]*?,([^(),$]+)(?=(?:\$[^()]*)?\))")
    .Cast<Match>()
    .Select(x => x.Groups[1].Value)
    .ToList();

See the regex demo. Details:

  • (?:\G(?!^)|\(DATA\$0) - either the end of the previous successful match, or (DATA$0 string
  • [^()]*? - zero or more chars other than (, ), ,, as few as possible
  • , - a comma
  • ([^(),$]+) - Group 1: one or more chars other than (, ), ,, $
  • (?=(?:\$[^()]*)?\)) - a positive lookahead matching the location that is immediately followed with an optional occurrence of a $ char followed with zero or more chars other than ( and ), and then a ) char.

Upvotes: 1

Related Questions