Francesco Aru
Francesco Aru

Reputation: 21

Parse XML with multiline attribute

i'm creating a python script to modify an xml file, let's say I have this kind of tag:

<z:row MGFF_SCRIPT='
        If Variabili(&#x22;UFFICIOPA&#x22;) = &#x22;&#x22; Then
            elemento = &#x22;0000000&#x22;
        Else
            elemento = Variabili(&#x22;UFFICIOPA&#x22;)
        End If
        '/>

I need to access the value of MGFF_SCRIPT attribute, modify it and then insert it in the same position; problem is when I get the value of the attribute (Element.get(key)) and save it in a py string: it has no newline/indentation, it's a single line string, so, when I put the modified value in the attribute and create the new xml, I'll have the attribute MGFF_SCRIPT which its content is written in a single line. Because the content is a script this situation creates a lot of problems. How can i parse the attribute content keeping the newlines/indentations?

Upvotes: 2

Views: 919

Answers (2)

yazz
yazz

Reputation: 331

As @tdelaney said, we can replace it first, modify it and then restore it.

import re
html =  '''
<z:row MGFF_SCRIPT='
        If Variabili(&#x22;UFFICIOPA&#x22;) = &#x22;&#x22; Then
            elemento = &#x22;0000000&#x22;
        Else
            elemento = Variabili(&#x22;UFFICIOPA&#x22;)
        End If
        '/>
'''
# replace
blocks = re.compile("'[^']+'").findall(html)
for block in blocks:
  html = html.replace(block,block.replace('\n','&#10;'))
print (html)

# restore
blocks = re.compile("'[^']+'").findall(html)
for block in blocks:
  html = html.replace(block,block.replace('&#10;','\n'))
print (html)

Upvotes: 2

Michael Kay
Michael Kay

Reputation: 163262

It's a rather unfortunate rule in the XML specification that XML parsers are required to do attribute value normalization - which means that newlines in attribute values are replaced by spaces. Unless your XML parser has an option to suppress this (and most don't, because the spec requires it) you're stuck with it.

Upvotes: 3

Related Questions