Why does this element in lxml include the tail?

Question

Consider this Python script:

from lxml import etree

html = '''


  
    This is some text followed with 2 citations.1
       2This is some more text.
  
'''

tree = etree.fromstring(html)

for element in tree.findall(".//{*}span"):
    if element.get("class") == 'footnote':
        print(etree.tostring(element, encoding="unicode", pretty_print=True))

The desired output would be the 2 span elements, instead I get:

1
2This is some more text.

Why does it include the text after the element until the end of the parent element?

I'm trying to use lxml to link footnotes and when I a.insert() the span element into the a element I create for it, it's including the text after and so linking large amounts of text I don't want linked.

falsetru · Accepted Answer

Specifying with_tail=False will remove the tail text.

print(etree.tostring(element, encoding="unicode", pretty_print=True, with_tail=False))

See lxml.etree.tostring documentation.

Why does this element in lxml include the tail?

Answers (2)

Related Questions