Reputation: 135
I want to scrape a website, but it should look like I am from a specific (let's say USA for this example) country (to make sure that my results are valid).
I am working in Python (Scrapy). And for scraping, I am using the rotating user agents (see: https://pypi.org/project/scrapy-fake-useragent-fix/).
The user agents are what I need to scrape. But can I use this, in combination with the request to pretend that I am in a specific country?
If there are some possibilities (in scrapy, Python) please let me know. Appreciated!
Example how I used the User Agents in my script
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400,
}
Upvotes: 0
Views: 1511
Reputation: 1228
Hello @helloworld1990,
Based on your requirement, say if you want to make each request using different IPs i.e. use IP Rotation (used when the site is detecting and blocking you after certain requests) then go for "Proxy Providers" there are many such providers you just need to google them.
If its not the case above, then for short term use you can try using TOR IPs. But TOR IPs are well known and are generally blocked. Else, you can still buy few static IPs from Proxy Providers and make the requests.
if(uniqueIpForEachRequestFromDifferentGeoLocations){
//go for proxy providers - IP Rotation
}else{
if(shortTermUse){
//go for tor nodes
}else{
//go for static IPs`enter code here`
}
}
Cheers! Hope this helps..
Upvotes: 0
Reputation: 1801
to pretent a certain country you need an IP from that country. Unfortunately this is nothing you can configure just by scrapy settings etc. But you could use a proxy service like crawlera:
Note: unfortunalty this service is not free and the cheapest plan is about 25 EUR. There are many other cheaper services available. The reason Crawlera is expensive is that they offer ban detection and only serve good IPs for your chosen domain. I've found them useful for the cost on Amazon and Google. Though on lesser domains a cheaper service with unlimited service would be more suitable.
Upvotes: 1
Reputation:
You can do this using Selenium (Don't know about Scrapy), First tell the bot to go to this site : Proxy Site
And then add your target site to search box and scrape .
Upvotes: 0