Reputation: 8325
I am trying to split the d
attribute on a path tag in an svg file into tokens.
This one is relatively easy:
d = "M 2 -12 C 5 15 21 19 27 -2 C 17 12 -3 40 5 7"
tokens = d.split(/[\s,]/)
But this is also a valid d
attribute:
d = "M2-12C5,15,21,19,27-2C17,12-3,40,5,7"
The tricky parts are letters and numbers are no longer separated and negative numbers use only the negative sign as the separator. How can I create a regex that handles this?
The rules seem to be:
I know I can use lookaround, for example:
tokens = pathdef.split(/(?<=\d)(?=\D)|(?<=\D)(?=\d)/)
I'm having trouble forming a single regex that also splits on the minus signs and keeps the minus sign with the numbers.
The above code should tokenize as follows:
[ 'M', '2', '-12', 'C', '5', '15', '21', '19', '27', '-2', 'C', '17', '12', '-3', '40', '5', '7' ]
Upvotes: 5
Views: 515
Reputation: 22817
Unfortunately, JavaScript doesn't allow lookbehinds, so your options are fairly limited and the regex in the Other Regex Engines section below will not work for you (albeit it will with some other regex engines).
Note: The regex in this section (Other Regex Engines) will not work in Javascript. See the JavaScript solution in the Code section instead.
I think with your original regex you were trying to get to:
[, ]|(?<![, ])(?=-|(?<=[a-z])\d|(?<=\d)[a-z])
This regex allows you to split on those matches (,
or , or locations that are followed by
-
, or locations where a letter precedes a digit or locations where a digit precedes a letter).
var a = [
"M 2 -12 C 5 15 21 19 27 -2 C 17 12 -3 40 5 7",
"M2-12C5,15,21,19,27-2C17,12-3,40,5,7"
]
var r = /-?(?:\d*\.)?\d+|[a-z]/gi
a.forEach(function(s){
console.log(s.match(r));
});
-?\d+(?:\.\d+)?|[a-z]
Match either of the following
-?\d+(?:\.\d+)?
-?
Match -
literally zero or one time
(?:\d*\.)?
Match the following zero or one time
\d*
Match any number of digits\.
Match a literal dot\d+
Match one or more digits[a-z]
Match any character in the range from a-z
(any lowercase alpha character - since i
modifier is used this also matches uppercase variants of those letters)I added (?:\d*\.)?
because (to the best of my knowledge) you can have decimal number values in SVG d
attributes.
Note: Changed the original regex portion of \d+(?:\.\d+)?
to (?:\d*\.)?\d+
in order to catch numbers that don't have the whole number part such as .5
as per @Thomas (see comments below question).
Upvotes: 5
Reputation: 43169
You could go for
-?\d+|[A-Z]
matches = "M 2 -12 C 5 15 21 19 27 -2 C 17 12 -3 40 5 7".match(/-?\d+|[A-Z]/g)
# matches holds the different tokens
Upvotes: 1