Comments in ET: more than comment tag

Question

I have a xml file like:




    
        
            
            
            Inhalt 1
        
            Inhalt 2 ergänzt mit Umlaut
        
        
            
            Inhalt 3
        
            Inhalt 3

Now I want to iterate over the comments. I did it with lxml.etree as ET-tree:

comments = root.xpath('//comment()')
for comment in comments:
    print(ET.tostring(comment))

But instead of printing all the comments without the text from the parent node, it prints this:

b''
b''
b'Inhalt 1'
b''
b'mit Umlaut
		'
b''
b'Inhalt 3'
b'
		'

Can someone explain to me, why this happens and how I can change maybe the xpath-expression to just return the comment nodes without the text being appended to the end of the comment.

Thank you!

mzjn · Accepted Answer

The comment nodes are written with the tail text included (the default; see https://lxml.de/api/lxml.etree-module.html#tostring).

To get rid of the tails, change

print(ET.tostring(comment))

to

print(ET.tostring(comment, with_tail=False))

If you are just interested in the content of the comments and not the markup, use this:

print(comment.text)

Comments in ET: more than comment tag

Answers (1)

Related Questions