Clemens Cichocki
Clemens Cichocki

Reputation: 15

Mathematical formula string to php variables and operators

I have to following problem. I find it difficult to explain, as I am a hobby coder. Please forgive me if commit any major fauxpas:

I am working on a baseball database that deals with baseball specfic and non specific metrics/stats. Most of the date output is pretty simple when the metrics are cumulative or when I only want to display the dataset of one day that has been entered manually or imported from a csv. (All percentages are calculated and fed into the db as numbers)

For example, if I put in the stats of

{ Hits:1, 
  Walks:2, 
  AB:2, 
  Bavg:0.500 } 

for day one, and

{ Hits:2, 
  Walks:2, 
  AB:6, 
  Bavg:0.333 } 

and then try to get the totals, Hits, Walks and ABs are simple: SUM. But Bavg has to be a formula (Hits/AB). Other (non baseball specific) metrics, like vertical jump or 60 yard times are pretty straight forward too: MAX or MIN.

The user should be able to add his own metrics. So he has to be able to input a formula for the calculation. This calculation is stored in the database table as a column next to the metric name, id, type (cumulative, max, min, calculated).

The php script that produces the html table is setup to where it is dynamic to what ever metrics and however many metrics the query sends (the metrics can be part of several categories). In the end result, I want to replace all values of metrics of the calculated types with their formula.

My approach is to get the formula from the mysql table as a string. Then, in php, convert the string that could be Strikes/Pitches*100 into $strikes/$pitches*100 - assuming that is something I could put into an php sql query. However, before it is put into the $strikes/$pitches*100 format, I need to have those variables available to define them. That I'm sure I can do, but I'll cross that bridge when I get there.

Could you point me in the right direction of either how to accomplish that or tell where or what to search for? I'm sure this has been done before somewhere...

I highly appreciate any help!

Clemens

Upvotes: 0

Views: 688

Answers (2)

LSerni
LSerni

Reputation: 57418

The correct solution has already been given by Vilx-. So I will give you a not-so-correct, dirty solution.

As the correct solution states, eval is evil. But, it is also easy and powerful (as evil often is -- but I'll spare you my "hhh join the Dark Side, Luke hhh" spiel).

And since what you need to do is a very small and simple subset of SQL, you actually can use eval() - or even better its SQL equivalent, plugging user supplied code into a SQL query - as long as you do it safely; and with small requirements, this is possible.

(In the general case it absolutely is not. So keep it in mind - this solution is easy, quick, but does not scale. If the program grows beyond a certain complexity, you'll have to adopt Vilx-'s solution anyway).

You can verify the user-supplied string to ensure that, while it might not be syntactically or logically correct, at least it won't execute arbitrary code.

This is okay:

SELECT SUM(pitch)+AVG(runs)-5*(MIN(balls)) /* or whatever */

and this, while wrong, is harmless too:

SELECT SUM(pitch +

but this absolutely is not (mandatory XKCD reference):

SELECT "Robert'); DROP TABLE Students;--

and this is even worse, since the above would not work on a standard MySQL (that doesn't allow multiple statements by default), while this would:

SELECT SLEEP(3600)

So how do we tell the harmless from the harmful? We start by defining placeholder variables that you can use in your formula. Let us say they will always be in the form {name}. So, get those - which we know to be safe - out of the formula:

$verify = preg_replace('#{[a-z]+}#', '', $formula);

Then, arithmetic operators are also removed; they are safe too.

$verify = preg_replace('#[+*/-]#', '', $verify);

Then numbers and things that look like numbers:

$verify = preg_replace('#[0-9.]+#', '', $verify);

Finally a certain number of functions you trust. The arguments of these functions may have been variables or combinations of variables, and therefore they've been deleted and the function has now no arguments - say, SUM() - or you had nested functions or nested parentheses, like SUM (SUM( ()())).

You keep replacing () (with any spaces inside) with a single space until the replacement no longer finds anything:

for ($old = ''; $old !== $verify; $verify = preg_replace('#\\s*\\(\\s*\\)\\s*#', ' ', $verify)) {
    $old = $verify;
}

Now you remove from the result the occurrences of any function you trust, as an entire word:

for ($old = ''; $old !== $verify; $verify = preg_replace('#\\b(SUM|AVG|MIN|MAX)\\b#', ' ', $verify)) {
    $old = $verify;
}

The last two steps have to be merged because you might have both nested parentheses and functions, interfering with one another:

for ($old = ''; $old !== $verify; $verify = preg_replace('#\\s*(\\b(SUM|AVG|MIN|MAX)\\b|\\(\\s*\\))\\s*#', ' ', $verify)) {
    $old = $verify;
}

And at this point, if you are left with nothing, it means the original string was harmless (at worst it could have triggered a division by 0, or a SQL exception if it was syntactically wrong). If instead you're left with something, the formula is rejected and never saved in the database.

When you have a valid formula, you can replace variables using preg_replace_callback() so that they become numbers (or names of columns). You're left with what is either valid, innocuous SQL code, or incorrect SQL code. You can plug this directly into the query, after wrapping it in try/catch to intercept any PDOException or division by zero.

Upvotes: 1

Vilx-
Vilx-

Reputation: 106970

I'll assume that the requirement is indeed to allow the user to enter arbitrary formulas. As noted in the comments, this is indeed no small task, so if you can settle for something less, I'd advise doing so. But, assuming that nothing less will do, let's see what can be done.

The simplest idea is, of course, to use PHP's eval() function. You can execute arbitrary PHP code from a string. All you need to do is to set up all necessary variables beforehand and grab the return value.

It does have drawbacks however. The biggest one is security. You're essentially executing arbitrary user-supplied code on your server. And it can do ANYTHING that your own code can. Access files, use your database connections, change variables, whatever. Unless you completely trust your users, this is a security disaster.

Also syntax or runtime errors can throw off the rest of your script. And eval() is pretty slow too, since it has to parse the code every time. Maybe not a big deal in your particular case, but worth keeping an eye on.

All in all, in every language that has an eval() function, it is almost universally considered evil and to be avoided at all costs.

So what's the alternative? Well, a dedicated formula parser/executor would be nice. I've written one a few times, but it's far from trivial. The job is easier if the formula is written in Polish notation or Reverse Polish Notation, but those are a pain to write unless you've practiced. For normal formulas, take a look at the Shunting Yard Algorithm. It's straightforward enough and can be easily adapted to functions and whatnot. But it's still fairly tedious.

So, unless you want to do it as a fun challenge, look for a library that has already done it. There seem to be a bunch of them out there. Search for something along the lines of "arithmetic expression parser library php".

Upvotes: 1

Related Questions