Anna
Anna

Reputation: 409

TypeError: object of type 'lxml.etree._ElementTree' has no len()

I'm trying to erase some empty text tags in an XML file returned by a Python function, but I get this error: TypeError: object of type 'lxml.etree._ElementTree' has no len(). Why?

This is the function:

def due(pdfpath):

ntree = uniform_cm(pdfpath)
etree.strip_tags(ntree, 'textline')

# Search for all text "textbox" elements
for textbox in ntree.xpath('//textbox'):
    new_line = etree.Element("new_line")
    previous_bb = None

    # From a given textbox element, iterate over all the "text" elements
    for x in textbox.iter("text"):
        # Get current bb valu
        bb = getBBoxFirstValue(x)
        # Check current and past values aren't empty
        if bb is not None and previous_bb is not None and (bb - previous_bb) > 20:
            # Inserte newline into parent tag
            x.getparent().insert(x.getparent().index(x), new_line)

            # A new "new_line" element is created
            new_line = etree.Element("new_line")

        # Append current element is new_line tag
        new_line.append(x)

        # Keep latest non empty BBox 1st value
        if bb is not None:
            previous_bb = bb

    # Add last new_line element if not null
    textbox.append(new_line)
tree = ntree


soup = BeautifulSoup(tree, "lxml")

for x in soup.find_all():
    if len(x.get_text(strip=True)) == 0:
        x.extract()


return tree

Upvotes: 1

Views: 1297

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 30971

The only case of len in your code sample is: if len(x.get_text(strip=True)) == 0:

But I checked type(x) and got bs4.element.Tag, whereas in your error message is 'lxml.etree._ElementTree' has no len().

So apparently your error occurred in some other place.

An advice for the future: When you look for a cause of an exception, state precisely in which line it occurred. The StackTrace contains indication on this matter.

So I performed some investigation without any connection with your code sample.

When you parse an XML file using lxml, e.g.:

from lxml import etree as et
tree = et.parse('Input.xml')

the type of tree (the whole XML document) is just lxml.etree._ElementTree.

When you now attempt to run: len(tree) you will get just:

TypeError: object of type 'lxml.etree._ElementTree' has no len()

But when you read a root element from this tree: root = tree.getroot(), the type of root is lxml.etree._Element (note that now you have an Element not the whole document) and you can run len(root), getting the number of its (direct) children. The same for any other element it the XML tree.

Note also the following inconsistency in lxml:

When you read XML content from a string, i.e.: root = et.XML(some_text_variable) the result is the root element, not the document tree.

And now you can call len(root).

Upvotes: 2

Related Questions