dingmatt
dingmatt

Reputation: 23

How to use python XPath to return a parent element with filtered children

I'm working with a dataset where the 'parent' element tag is an unknown at runtime but I need to return both it and any child elements that have a certain attribute, I'm was hoping to do this with XPath but I'm no longer certain I can; can anyone give me a hand?

Here's an example dataset:

<Images>
    <Unknown1>
        <Image url="http://a.jpg" type="art" id="1"/>
    </Unknown1>
    <Unknown2>
        <Image url="http://b.jpg" type="art" id="1"/>
        <Image url="http://c.jpg" type="art" id="2"/>
        <Image url="http://d.jpg" type="draft" id="3"/>
        <Image url="http://e.jpg" type="draft" id="4"/>
        <Image url="http://f.jpg" type="poster" id="5"/>
        <Image url="http://g.jpg" type="poster" id="6"/>
    </Unknown2>
</Images>

Now I need to filter by 'type' so what I'm looking to return is something like (if I filtered using 'art'):

    <Unknown1>
        <Image url="http://a.jpg" type="art" id="1"/>
    </Unknown1>
    <Unknown2>
        <Image url="http://b.jpg" type="art" id="1"/>
        <Image url="http://c.jpg" type="art" id="2"/>
    </Unknown2>

Annoyingly I don't simply need a list of all the 'Image' elements but instead a list (containing the 'Unknown' elements (the actual tags not known at runtime) and the filtered children they contain (structured like above).

Is there any guru who could help me out? A pure XPath solution would be preferable but I'm not sure if it's feasible?

Thanks in advance.

Upvotes: 2

Views: 2945

Answers (1)

Risadinha
Risadinha

Reputation: 16671

This is an XPATH that does what you want. There might be alternatives and I have not checked whether it works with lxml:

//*[@type='art']/parent::*

Or you can restrict it to:

//Image[@type='art']/parent::*

I find http://www.zvon.org to be quite helpful whenever it comes to xpath. It even has a little testing ground: http://www.zvon.org/comp/tests/r/test-xlab.html#intro

@Andersson is right, if you query for the parent you will get back the parent including all of its children. So, you have to iterate over the children and get their parents using lxml.

Upvotes: 4

Related Questions