ren
ren

Reputation: 270

Selecting href of link with image inside using xpath

I'm using scrapy to write a scraper that finds links with images inside them and grabs the link's href. The page I'm scraping is populated with image thumbnails, and when you click on the thumbnail it links to a full size version of the image. I'd like to grab the full size images.

The html looks somewhat like this:

<a href="example.com/full_size_image.jpg">
     <img src="example.com/image_thumbnail.jpg">
</a>

And I want to grab "example.com/full_size_image.jpg".

My current method of doing so is

img_urls = scrapy.Selector(response).xpath('//a/img/..').xpath("@href").extract()

But I'd like to reduce that to a single xpath expression, as I plan to allow the user to enter their own xpath expression string.

Upvotes: 2

Views: 1557

Answers (1)

alecxe
alecxe

Reputation: 473903

You can check if an element has an another child element this way:

response.xpath('//a[img]/@href').extract()

Note that I'm using the response.xpath() shortcut and providing a single XPath expression.

Upvotes: 5

Related Questions