J newson
J newson

Reputation: 19

using regex to split equations with variables C#

I've been struggling with this for quite awhile (not being a regex ninja), searching stackoverflow and through trial an error. I think I'm close, but there are still a few hiccups that I need help sorting out.

The requirements are such that a given equation, that includes variables, exponents, etc, are split by the regex pattern after variables, constants, values, etc. What I have so far

     Regex re = new Regex(@"(\,|\(|\)|(-?\d*\.?\d+e[+-]?\d+)|\+|\-|\*|\^)");
     var tokens = re.Split(equation)

So an equation such as

    2.75423E-19* (var1-5)^(1.17)* (var2)^(1.86)* (var3)^(3.56)

should parse to

     [2.75423E-19 ,*, (, var1,-,5, ), ^,(,1.17,),*....,3.56,)]

However the exponent portion is getting split as well which I think is due to the regex portion: |+|-.

Other renditions I've tried are:

    Regex re1 = new Regex(@"([\,\+\-\*\(\)\^\/\ ])"); and 
    Regex re = new Regex(@"(-?\d*\.?\d+e[+-]?\d+)|([\,\+\-\*\(\)\^\/\ ])");

which both have there flaws. Any help would be appreciated.

Upvotes: 2

Views: 1438

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626903

For the equations like the one posted in the original question, you can use

[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?|[-^+*/()]|\w+

See regex demo

The regex matches:

  • [0-9]*\.?[0-9]+([eE][-+]?[0-9]+)? - a float number
  • | - or...
  • [-^+*/()] - any of the arithmetic and logical operators present in the equation posted
  • | - or...
  • \w+ - 1 or more word characters (letters, digits or underscore).

For more complex tokenization, consider using NCalc suggested by Lucas Trzesniewski's comment.

C# sample code:

var line = "2.75423E-19* (var1-5)^(1.17)* (var2)^(1.86)* (var3)^(3.56)";
var matches = Regex.Matches(line, @"[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?|[-^+*/()]|\w+");
foreach (Match m in matches)
    Console.WriteLine(m.Value);

And updated code for you to show that Regex.Split is not necessary here:

var result = Regex.Matches(line, @"\d+(?:[,.]\d+)*(?:e[-+]?\d+)?|[-^+*/()]|\w+", RegexOptions.IgnoreCase)
             .Cast<Match>()
             .Select(p => p.Value)
             .ToList();

Also, to match formatted numbers, you can use \d+(?:[,.]\d+)* rather than [0-9]*\.?[0-9]+ or \d+(,\d+)*.

Upvotes: 4

J newson
J newson

Reputation: 19

So I think I've got a solution thanks to @stribizhev solution lead me to the regex solution

            Regex re = new Regex(@"(\d+(,\d+)*(?:.\d+)?(?:[eE][-+]?[0-9]+)?|[-^+/()]|\w+)");
            tokenList = re.Split(InfixExpression).Select(t => t.Trim()).Where(t => t != "").ToList();  

When split gives me the desired array.

Upvotes: -1

Related Questions