Reputation: 270
I'm using scrapy to write a scraper that finds links with images inside them and grabs the link's href. The page I'm scraping is populated with image thumbnails, and when you click on the thumbnail it links to a full size version of the image. I'd like to grab the full size images.
The html looks somewhat like this:
<a href="example.com/full_size_image.jpg">
<img src="example.com/image_thumbnail.jpg">
</a>
And I want to grab "example.com/full_size_image.jpg"
.
My current method of doing so is
img_urls = scrapy.Selector(response).xpath('//a/img/..').xpath("@href").extract()
But I'd like to reduce that to a single xpath expression, as I plan to allow the user to enter their own xpath expression string.
Upvotes: 2
Views: 1557
Reputation: 473903
You can check if an element has an another child element this way:
response.xpath('//a[img]/@href').extract()
Note that I'm using the response.xpath()
shortcut and providing a single XPath expression.
Upvotes: 5