user3185563
user3185563

Reputation: 1484

Scrapy: synchronize user-agent change with ip address change

I'm looking at this guide to using Tor and a user-agent switcher with Scrapy. It's similar to other guides on the subject. Tor changes the ip address roughly every 10 minutes. The middleware changes the user-agent on every request.

I'd like to synchronize the user-agent change with the ip address change. To achieve that I'd need to have some code executed just before Scrapy sends a request. The code would check if the ip has changed since the last request and, if so, change the user-agent. If the ip hasn't changed, it would use the same user-agent. I haven't been able to find a way of calling this code at the right place in the execution cycle.

The reasoning behind wanting this change is that on the sites that I'm scraping, it would be unusual for multiple requests with different user-agents to come from the same ip address.

Upvotes: 2

Views: 1106

Answers (1)

eLRuLL
eLRuLL

Reputation: 18799

You'll have to use a Downloader Middleware, specifically declaring the process_request method, so you can process the Request object before making the actual request.

There you can declare a dict of user-agents associated with a proxy per request, remember that you can specify the proxy per request with request.meta['proxy'] = "host:port"

Upvotes: 1

Related Questions