Reputation: 463
I have a problem with scraping a website using cookies,I'm using Scrapy but I can't obtain the data correct
I need to specify a website's cookie, because when I login from the browser is asking me to select a city , to show the relevant information
I was trying with some possible solutions unsuccessfully
class DmozSpider(Spider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/"
]
def parse(self, response):
request_with_cookies = Request(url="http://www.example.com",
cookies={'currency': 'USD', 'country': 'UY'},callback=self.parse_page2)
def parse_page2(self, response):
sel = Selector(response)
print sel
I have no idea where to locate these functions, for example, as I can use the function start_request,
class MySpider(BaseSpider):
name = "spider"
allowed_domains = ""
def start_requests(self):
return [Request(url="http://www.example.com",
cookies={'currency': 'USD', 'country': 'UY'})]
I'm doing it this way, but I'm not sure if I'm doing this the right way How should I handle correctly the start_requests functions? How should I handle the request_with_cookies function correctly? which is the correct way to specify some cookies to a url? I should put this
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = ["http://www.dmoz.org/Computers/Programming/Languages/Python/Books/"]
in the class when I use start_requests or requests_with_cookies?
Upvotes: 1
Views: 343
Reputation: 1501
Try to set the headers
parameter in the request as well (cookies are headers too), like so:
Request(..., headers = {'Cookie': 'currency=USD&country=UY'}, ...)
You can also try to activate the dont_merge_cookies
option in the meta
parameter of Request
:
Request(..., meta = {'dont_merge_cookies' : True}, ...)
This tells the crawler to ignore other cookies set by the site and only use these - in case these could be overriden by "mergeing".
I think it depends on the site's behaviour which of these will work, so try them in turn and see if they solve the problem.
Upvotes: 2