OMRY VOLK
OMRY VOLK

Reputation: 1481

Scrapy don't follow links to images

Is there a way in Scrapy to not follow <a> tags pointing to images?

For example:

<a href="http://jamsphere.com/wp-content/uploads/2015/11/Franki-Dennull-PROFILE.jpg">

My code at the moment:

for a in set(response.xpath('//a/@href')):
    yield scrapy.Request(url, callback=self.parse) 

Obviously I can add a hard coded check but was wondering if there is a built in option?

Upvotes: 1

Views: 343

Answers (1)

Guillaume
Guillaume

Reputation: 1879

Use a LinkExtractor, by default it filters out the common image / video / audio / file extensions.

Look here to see the ignored extensions.

Upvotes: 2

Related Questions