Jedani
Jedani

Reputation: 83

why Element tree in python adds extra new lines and spaces in XML

how can i change the appearance of my xml from e.g

 <root>
     <elem1>
         <value>
            122
         </value>
         <text>
            This_is_just_a_text
         </text>
     </elem1>
     <elem1>
         <value>
            122
         </value>
         <text>
            This_is_just_a_text
         </text>
     </elem1>   
 </root>

to something look like:

 <root>
     <elem1>
         <value>122</value>
         <text>This_is_just_a_text</text>
     </elem1>
     <elem1>
         <value>122</value>
         <text>This_is_just_a_text</text>
     </elem1>   
 </root>

I'm just wondering what cause that to happen? and by the way the below method/function is used to add the indents!

 def prettify(elem):
     """
         Return a pretty-printed XML string for the Element.
     """
     rough_string = ET.tostring(elem, 'utf-8')
     reparsed = minidom.parseString(rough_string)
     return reparsed.toprettyxml(indent="\t")

Upvotes: 3

Views: 4817

Answers (2)

Jedani
Jedani

Reputation: 83

I'm writing this answer just for those who might have the same problem one day.

here what i found out! there actually was a bug in the built in method toprettyxml() for all python versions before python2.7.3 this bug caused the addition of redundant spaces and new lines in your xml output. so if you have python 2.7.3 or higher you will be fine using the prettify() method that provided in the question and you shouldnt see any extra lines or spaces but if you are using an older version then here is a way to fix it using "regular expression":

 def prettify(elem):
     """
         Return a pretty-printed XML string for the Element.
     """
     rough_string = ET.tostring(elem, 'utf-8')
     reparsed = minidom.parseString(rough_string)
     uglyXml = reparsed.toprettyxml(indent="\t")
     pattern = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL)
     return pattern.sub('>\g<1></', uglyXml) 

Pretty printing XML in Python

Upvotes: 2

Robᵩ
Robᵩ

Reputation: 168626

An Element holds its contained text in a regular str, so you can invoke str.strip() to get rid of the unwanted whitespace.

import xml.etree.ElementTree as ET
import xml.dom.minidom as minidom

def prettify(elem):
     """
         Return a pretty-printed XML string for the Element.
     """
     rough_string = ET.tostring(elem, 'utf-8')
     reparsed = minidom.parseString(rough_string)
     return reparsed.toprettyxml(indent="\t")

def strip(elem):
    for elem in elem.iter():
        if(elem.text):
            elem.text = elem.text.strip()
        if(elem.tail):
            elem.tail = elem.tail.strip()

xml = ET.XML('''<elem1>
         <value>
            122
         </value>
         <text>
            This_is_just_a_text
         </text>
     </elem1>''')

strip(xml)
print prettify(xml)

Result:

<?xml version="1.0" ?>
<elem1>
    <value>122</value>
    <text>This_is_just_a_text</text>
</elem1>

Upvotes: 6

Related Questions