i_trope
i_trope

Reputation: 1604

lxml - get attribute of child based on parent class

I am trying to extract hrefs from the first child of td tags with the class foo. An example DOM is:

<td class="foo">
   <a href="www.foobar1.com"></a>
</td>
<td class="foo">
   <a href="www.foobar2.com"></a>
</td>

From this I would like to get ["www.foobar1.com", "www.foobar2.com"]

So far I have the following:

import requests
from lxml import html

def get_hrefs(url):
    page = requests.get(url)
    tree = html.fromstring(page.text)
    td_elements = tree.xpath('//td[@class="foo"]')

    return [el.find("a").attrib["href"] for el in td_elements]

However, I feel like it would be more efficient to extend the xpath instead of doing the iteration, but not sure how to construct it.

Thank you.

Upvotes: 1

Views: 475

Answers (1)

alecxe
alecxe

Reputation: 473893

Yes, you can simplify it by getting the @href from the a tag inside each td:

return tree.xpath('//td[@class="foo"]/a/@href')

Upvotes: 1

Related Questions