Ken
Ken

Reputation: 395

Scrapy request url comes from which url response

For Scrapy, we could get the response.url, response.request.url, but how do we know the response.url, response.request.url is extracted from which parent url?

Thank you, Ken

Upvotes: 1

Views: 1071

Answers (1)

Gallaecio
Gallaecio

Reputation: 3847

You can use Request.meta to keep track of such information.

When you yield your request, include response.url in the meta:

yield response.follow(link, …, meta={'source_url': response.url})

Then read it on your parsing method:

source_url = response.meta['source_url']

That is the most straightforward way to do this, and you can use this method to keep track of original URLs even across different parsing methods, if you wish.

Otherwise, you might want to look into taking advantage of the redirect_urls meta key, which keeps track of redirect jumps.

Upvotes: 4

Related Questions