Ithilion
Ithilion

Reputation: 140

Python XPath include missing elements

<tree>
    <item>
        <element1>somedata</element1>
        <element2>moredata</element2>
        <element3>data?</element3>
        <optional_element>data!</optional_element>
    </item>
    <item>
        <element1>somedata</element1>
        <element2>moredata</element2>
        <element3>data?</element3>
    </item>
    <item>
        <element1>somedata</element1>
        <element2>moredata</element2>
        <element3>data?</element3>
        <optional_element>data!</optional_element>
    </item>
    <item>
        <element1>somedata</element1>
        <element2>moredata</element2>
        <element3>data?</element3>
    </item>
</tree>

I have an XML document like this one, what i am trying to accomplish is to get this kind of output: ["data!", "", "data!", ""] instead of just ["data!", "data!"]
So far i have tried this approach without being able to make it work (the list will still just include elements that are present).

Upvotes: 1

Views: 146

Answers (1)

alecxe
alecxe

Reputation: 473863

I would use findtext() and specify the default:

[item.findtext("optional_element", default="") for item in tree.findall("item")]

Demo (using lxml):

>>> from lxml import etree
>>> 
>>> data = """<?xml version="1.0" encoding="utf-8"?>
... <tree>
...     <item>
...         <element1>somedata</element1>
...         <element2>moredata</element2>
...         <element3>data?</element3>
...         <optional_element>data!</optional_element>
...     </item>
...     <item>
...         <element1>somedata</element1>
...         <element2>moredata</element2>
...         <element3>data?</element3>
...     </item>
...     <item>
...         <element1>somedata</element1>
...         <element2>moredata</element2>
...         <element3>data?</element3>
...         <optional_element>data!</optional_element>
...     </item>
...     <item>
...         <element1>somedata</element1>
...         <element2>moredata</element2>
...         <element3>data?</element3>
...     </item>
... </tree>
... """
>>> 
>>> tree = etree.fromstring(data)
>>> print [item.findtext("optional_element", default="") for item in tree.findall("item")]
['data!', '', 'data!', '']

Upvotes: 3

Related Questions