Reputation: 57
I'm looking for a Python extension to parse a SVG's "points" values from the <polyline>
elements and print them? Possibly to parse it from the URL? or I could save the SVG and do it locally.
I just need it to parse the points
values and print them separately for each polyline
element. So it will print something like this for each points
value of the current <polyline>
element.
[[239,274],[239,274],[239,274],[239,275],[239,275],[238,276],[238,276],[237,276],[237,276],[236,276],[236,276],[236,277] [236,277],[235,277],[235,277],[234,278],[234,278],[233,279],[233,279],[232,280] [232,280],[231,280],[231,280],[230,280],[230,280],[230,280],[229,280],[229,280]]
So after the first polyline
element gets parsed and printed, it would parse the next polyline
element and get the value for points
and print it just like the first one until there is no more to be printed.
The SVG's URL: http://colorillo.com/bx0l.inline.svg
Here is a HTML example of a polyline element from the SVG
<polyline points="239,274 239,274 239,274 239,275 239,275 238,276 238,276 237,276 237,276 236,276 236,276 236,277 236,277 235,277 235,277 234,278 234,278 233,279 233,279 232,280 232,280 231,280 231,280 230,280 230,280 230,280 229,280 229,280" style="fill: none; stroke: #000000; stroke-width: 1; stroke-linejoin: round; stroke-linecap: round; stroke-antialiasing: false; stroke-antialias: 0; opacity: 0.8"/>
I'm just looking for some quick help, and a example.. If you're able to help me out that would be neat.
Upvotes: 0
Views: 1491
Reputation: 23815
Below
import xml.etree.ElementTree as ET
from collections import namedtuple
import requests
import re
Point = namedtuple('Point', 'x y')
all_points = []
r = requests.get('http://colorillo.com/bx0l.inline.svg')
if r.status_code == 200:
data = re.sub(' xmlns="[^"]+"', '', r.content.decode('utf-8'), count=1)
root = ET.fromstring(data)
poly_lines = root.findall('.//polyline')
for poly_line in poly_lines:
tmp = []
_points = poly_line.attrib['points'].split(' ')
for _p in _points:
tmp.append(Point(*[int(z) for z in _p.split(',')]))
all_points.append(tmp)
for points in all_points:
tmp = [str([p.x, p.y]).replace(' ','') for p in points]
line = ','.join(tmp)
print('[' + line + ']')
Upvotes: 0
Reputation: 13510
I believe there is an HTML extraction package somewhere, but this is the kind of task I would do with core python, and the regular expressions module. Let txt
be the text you presented <polyline...
, so:
Importing regular expression module
In [22]: import re
Performing the search:
In [24]: g = re.search('polyline points="(.*?)"', txt)
In the above regex I use polyline points="
as an anchor (I omitted the <
because it has a meaning in regex`) and capture all the rest until the next quotation marks.
The text you want is achieved by:
In [25]: g.group(1)
Out[25]: '239,274 239,274 239,274 239,275 239,275 238,276 238,276 237,276 237,276 236,276 236,276 236,277 236,277 235,277 235,277 234,278 234,278 233,279 233,279 232,280 232,280 231,280 231,280 230,280 230,280 230,280 229,280 229,280'
It's safer to use xml to parse the data, here is one way to do it (xml.etree is included with the standard library):
In [32]: import xml.etree.ElementTree as ET
In [33]: root = ET.fromstring(txt)
Since your data is formatted as a root tag already, you don't need futher extractions:
In [35]: root.tag
Out[35]: 'polyline'
And all the properties are actually XML attributes, converted to a dictionary:
In [37]: root.attrib
Out[37]:
{'points': '239,274 239,274 239,274 239,275 239,275 238,276 238,276 237,276 237,276 236,276 236,276 236,277 236,277 235,277 235,277 234,278 234,278 233,279 233,279 232,280 232,280 231,280 231,280 230,280 230,280 230,280 229,280 229,280', 'style': 'fill: none; stroke: #000000; stroke-width: 1; stroke-linejoin: round; stroke-linecap: round; stroke-antialiasing: false; stroke-antialias: 0; opacity: 0.8'}
So here you have it:
In [38]: root.attrib['points']
Out[38]: '239,274 239,274 239,274 239,275 239,275 238,276 238,276 237,276 237,276 236,276 236,276 236,277 236,277 235,277 235,277 234,278 234,278 233,279 233,279 232,280 232,280 231,280 231,280 230,280 230,280 230,280 229,280 229,280'
If you like further to split this to groups according to commas and spaces, I would do this:
Get all groups separated by a space using split
with no arguments:
>>> p = g.group(1).split()
>>> p
['239,274', '239,274', '239,274', '239,275', '239,275', '238,276', '238,276', '237,276', '237,276', '236,276', '236,276', '236,277', '236,277', '235,277', '235,277', '234,278', '234,278', '233,279', '233,279', '232,280', '232,280', '231,280', '231,280', '230,280', '230,280', '230,280', '229,280', '229,280']
Now for each string, split it at the comma which will return a list of strings. I use map
to convert each such list to a list of int
s:
>>> p2 = [list(map(int, numbers.split(','))) for numbers in p]
>>> p2
[[239, 274], [239, 274], [239, 274], [239, 275], [239, 275], [238, 276], [238, 276], [237, 276], [237, 276], [236, 276], [236, 276], [236, 277], [236, 277], [235, 277], [235, 277], [234, 278], [234, 278], [233, 279], [233, 279], [232, 280], [232, 280], [231, 280], [231, 280], [230, 280], [230, 280], [230, 280], [229, 280], [229, 280]]
And this will shed some more light:
>>> '123,456'.split(',')
['123', '456']
>>> list(map(int, '123,456'.split(',')))
[123, 456]
Upvotes: 2