Reputation: 311
Say I have a string like the following:
var str = "23*45+ 4*12"
How can I use regex to return
["23","*","45","+","4","*","12"]
How would you do it to include decimals and negative numbers as well?
var str1 = "45.3*23+(-4*5.8)
Result should look like:
["45.3","*","23","+","-4","*","5.8"]
Please explain your answer as I am new to regex. Thanks!
Upvotes: 0
Views: 320
Reputation: 88378
The most trivial way to solve your particular problem is to split on word boundaries:
$ node
> "23*45+ 4*12".split(/\b/)
[ '23', '*', '45', '+ ', '4', '*', '12' ]
The regex /\b/
is described here.
Now since you have spaces, you should probably trim each result:
$ node
> "23*45+ 4*12".split(/\b/).map(function (s) {return s.trim()})
[ '23', '*', '45', '+', '4', '*', '12' ]
Now if you only want to capture the numbers and the arithmetic operators, you should probably match them directly rather than relying on splitting. The regex to match an integer is
\d+
which is one or more digits, and the regex to match an operator is
[=+/*]
Note that I put the dash first because if it were in the middle it would be a character range (like [a-z]
which matches 26 characters whereas [-az]
matches only three). Now you can do this:
$ node
> "23*45+ 4*12".match(/[-+/*]|\d+/g)
[ '23', '*', '45', '+', '4', '*', '12' ]
The g
is a modifier to the regex that says get all the matches (the g
stands for "global"). One interesting thing about the last approach is that it will skip characters that don't belong, so this can happen:
$ node
> "23 * blah45 + 4*~~~~12".match(/[-+/*]|\d+/g)
[ '23', '*', '45', '+', '4', '*', '12' ]
Now let's suppose you wanted to add floating point numbers. Those look like 22.807
, i.e., digits, then a dot, then more digits. The dot is special in regexes, so we have to write it like this: \.
. So to capture something that can be either an integer or a floating point value we would write:
\d+(\.d+)?
where the ?
means optional. Now we could also add a leading optional negative sign:
-?\d+(\.\d+)?
and as you know from programming, we like to write numbers like 255.84E-19
, which gets us to:
-?\d+(\.\d+([Ee][+-]?\d+)?)?
with an optional exponential part, containing either an upper or lower case E, then an optional sign, then a required exponent number.
Then there's this thing about parentheses having performance issues in "non-capturing" contexts (hard to explain here), so pros would write:
-?\d+(?:\.\d+(?:[Ee][+-]?\d+)?)?
and in all its glory you get
$ node
> "23 * -99.45 + 4.2E7 * -12".match(/-?\d+(?:\.\d+(?:[Ee][+-]?\d+)?)?|[-+*/]/g)
[ '23', '*', '-99.45', '+', '4.2E7', '*', '-12' ]
There's a lot going on here. Sorry to get so deep into it.
By the way, the general problem of parsing arithmetic expressions and validating them can't be done by a regex anyway, but hopefully this explanation will get you started.
Upvotes: 3