Reputation: 4168
I want to use proxy IP for web scraping using scrapy. In order to use a proxy I set the environment variable http_proxy
as mentioned in the documentation.
$ export http_proxy=http://proxy:port
To test whether the change of IP worked, I created a new spider with the name test :
from scrapy.spider import BaseSpider
from scrapy.contrib.spiders import CrawlSpider, Rule
class TestSpider(CrawlSpider):
name = "test"
domain_name = "whatismyip.com"
start_urls = ["http://whatismyip.com"]
def parse(self, response):
print response.body
open('check_ip.html', 'wb').write(response.body)
but if I run this spider the check_ip.html
do not show the IP as specified in the environment variable rather it shows the original IP as it was before crawling.
What is the problem ? is there any alternative way that I can check whether I am using a proxy IP or not? or is there any other way to use a proxy IP ?
Upvotes: 4
Views: 4425
Reputation: 1671
Edit settings.py in your current project and make sure you have HttpProxyMiddleware enabled:
DOWNLOADER_MIDDLEWARES = {
#you need this line in order to scrap through a proxy/proxy list
'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,
}
Upvotes: 2