parsing CDATA (one more)

Question

I need to parse CDATA from the following svg-document:

The code I'm using is as follows:

import xml.dom.minidom

file_svg= "my_path"

doc = xml.dom.minidom.parse(file_svg)

style = doc.getElementsByTagName('style')

cdata = style[0].firstChild.wholeText

which gives me just the text inside CDATA like this (print cdata):


text.f0 {font-family:cmex10;font-size:11.955168px}
text.f1 {font-family:cmmi12;font-size:11.955168px}
text.f2 {font-family:cmr12;font-size:11.955168px}

But I need this text to be organized into smth like this:

{"f0":"cmex10","f1":"cmmi12","f2":"cmr12"}

I'm sure there is a way to extract the data in terms of text values: f0, f1, f2 and the values of font-families: cmex10, cmmi12, cmr12 with standard xml.dom.minidom operations.

I tried:

style[0].firstChild.nodeValue

but it produced an empty string.

Could you help me with this?

Alexandra Dudkina · Accepted Answer

As pointed out in comments, CDATA should be parsed as text. Here is an example of simple parsing:

text = '''text.f0 {font-family:cmex10;font-size:11.955168px}
text.f1 {font-family:cmmi12;font-size:11.955168px}
text.f2 {font-family:cmr12;font-size:11.955168px}'''

d = {}

for line in text.split('
'):
  value = line.split(':')[1].split(';')[0]
  key = line.split('.')[1].split(' ')[0]
  d[key] = value
  
print(d)

Output:

{'f0': 'cmex10', 'f1': 'cmmi12', 'f2': 'cmr12'}

parsing CDATA (one more)

Answers (2)

Related Questions