user3340499
user3340499

Reputation: 199

lxml XPath search with two conditions

My XML file is:

<releases>
    <release id="1">
        <title>Title1</title>
        <formats>
            <format name="CD" qty="2" text="">
            </format>
        </formats>
        <released>2016-02-00</released>
    </release>
    <release id="2">
        <title>Title2</title>
        <formats>
            <format name="LP" qty="2" text="">
            </format>
        </formats>
        <released>2018-03-00</released>
    </release>
    <release id="3">
        <title>Title3</title>
        <formats>
            <format name="CD" qty="1" text="">
            </format>
        </formats>
        <released>1995-01-15</released>
    </release>  
</releases>

In Python3, I want to find the release IDs where the "format" name = "CD" and the "released" text contains text "1995" (so release id 3 should be the result)

I have this code which finds the CD releases, and prints the release dates:

for rls in root.findall(".//format[@name='CD']....//released"):
    print (rls.tag, rls.attrib, rls.text)

And I also have this code which finds all the releases with "1995" and prints the date of the first result:

print (root.xpath("/releases/release/released[contains(text(),'1995')]")[0].text)

I'm having trouble finding how to combine both (also I'm using findall in one, and xpath in the other, not pretty).

Upvotes: 0

Views: 1459

Answers (2)

James
James

Reputation: 36598

You can combine the conditions in the predicate portion of a selector in XPath. The following tells XPath to:

  • return all release nodes, that contain:
    • a format node with the attribute of name=CD and
    • a released node with text that has 1995 in in
xml.xpath("./release[.//format[@name='CD'] and .//released[contains(text(),'1995')]]/@id")
# returns:
['3']

Upvotes: 1

kjhughes
kjhughes

Reputation: 111491

This XPath,

/releases/release[formats/format/@name='CD'][starts-with(released,'1995')]

will select those release elements in CD format whose released date starts with 1995,

<release id="3">
    <title>Title3</title>
    <formats>
        <format name="CD" qty="1" text="">
        </format>
    </formats>
    <released>1995-01-15</released>
</release>  

as requested.

You mentioned wanting the id attributes. If you actually do want to iterate over all such id attributes rather than the elements themselves, simply append /@id to the above XPath.

Upvotes: 1

Related Questions