lxml - get attribute of child based on parent class

Question

I am trying to extract hrefs from the first child of td tags with the class foo. An example DOM is:

From this I would like to get ["www.foobar1.com", "www.foobar2.com"]

So far I have the following:

import requests
from lxml import html

def get_hrefs(url):
    page = requests.get(url)
    tree = html.fromstring(page.text)
    td_elements = tree.xpath('//td[@class="foo"]')

    return [el.find("a").attrib["href"] for el in td_elements]

However, I feel like it would be more efficient to extend the xpath instead of doing the iteration, but not sure how to construct it.

Thank you.

alecxe · Accepted Answer

Yes, you can simplify it by getting the @href from the a tag inside each td:

return tree.xpath('//td[@class="foo"]/a/@href')

lxml - get attribute of child based on parent class

Answers (1)

Related Questions