Reputation: 1455
I would like to parse the d attribute commands within a path element of a svg. And I would like to do it in an efficient way. Therefore I decided to go with a regex function to avoid using several loops.
What I want to achieve is to put the command letter along with its numeric values in a tuple and store all those tuples in a list e.g. [('M', '3', '18'), ('h', '10'), ...]
Depending on the command letter there can be one to six numeric values following. These numeric value can have a dot ('.45') or a minus ('-3') or both in it ('-.55'). And there are not always spaces seperating them. e.g. 'c -.55.45 0 1 '
.
My Approach:
Here is what I tried so far. I tried to separate them with the re.findall method. But after that I had to group them with an additional loop and those connected numeric values with dots are still connected. Furthermore I would like to integrate the replace method into the findall patterns.
# Just an extract of a d command
d = 'M20 3H4c-.55 0-1 .45-1 1v6c0 .55.45 1 1 1h16'
commands = re.findall("[mMzZlLhHvVcCsSqGtTaA]|[0-99\-.]+", d.replace("-", " -"))
#output: ['M', '20', '3', 'H', '4', 'c', '-.55', '0', '-1', '.45', '-1', '1', 'v', '6', 'c', '0', '.55.45', '1', '1', '1', 'h', '16']
#goal: [('M', '20', '3'), ('H', '4'), ('c', '-.55', '0', '-1', '.45', '-1', '1'), ('v', '6'), ('c', '0', '.55', '.45', '1', '1', '1'), ('h', '16')]
Those dotted connected numeric values seems to be easy. I just separate them on the dots. But this is not possible because I could have a value like '1.55'. But then this value is separated with a space to the other value ('.55 1.45'). As I had a hard time with those regex patterns, it would be awesome if someone has a solution or at least could guide me into the right direction.
If I missed something or you need more information, just tell me and I will provide them. Thank you in advance!
Upvotes: 2
Views: 452
Reputation: 626699
If there can be only zero to six arguments, the best you can do with a one-regex approach is to use
re.findall("([mMzZlLhHvVcCsSqGtTaA])(?:\s*(-?\d*\.?\d+))?(?:\s*(-?\d*\.?\d+))?(?:\s*(-?\d*\.?\d+))?(?:\s*(-?\d*\.?\d+))?(?:\s*(-?\d*\.?\d+))?(?:\s*(-?\d*\.?\d+))?", d)
See the regex demo. The (?:\s*(-?\d*\.?\d+))?
pattern is repeated 6 times to match 1 to 6 arguments and capture each of them into its own group. (?:...)?
is an optional non-capturing group, \s*(-?\d*\.?\d+)
matches 0+ whitespaces (\s*
), (-?\d*\.?\d+)
captures into a group an optional -
(-?
), 0+ digits (\d*
), an optional dot (\.?
) and 1+ digits (\d+
).
See Python demo:
import re
d = 'M0 0h24v24H0z'
commands = re.findall(r"([mMzZlLhHvVcCsSqGtTaA])(?:\s*(-?\d*\.?\d+))?(?:\s*(-?\d*\.?\d+))?(?:\s*(-?\d*\.?\d+))?(?:\s*(-?\d*\.?\d+))?(?:\s*(-?\d*\.?\d+))?(?:\s*(-?\d*\.?\d+))?", d)
print([tuple(list(filter(None, x))) for x in commands])
# => [('M', '0', '0'), ('h', '24'), ('v', '24'), ('H', '0'), ('z',)]
Upvotes: 1