C.shayv
C.shayv

Reputation: 71

Scrapy: How to get cookies from splash

I am trying to get the cookies from a splash request, but I keep getting an error.

Here is the code I am using:

class P2PEye(scrapy.Spider):
    name = 'p2peyeSpider'
    allowed_domains = ['p2peye.com']
    start_urls = ['https://www.p2peye.com/platform/h9/']

    def start_requests(self):
        script = '''
        function main(splash)
          local url = splash.args.url
          assert(splash:go(url))
          assert(splash:wait(0.5))
          return {
            cookies = splash:get_cookies(),
          }
        end
        '''
        for url in self.start_urls:
            yield SplashRequest(url, callback=self.parse, endpoint='render.html',args={'wait': 1, 'lua_source': script})

    def parse(self, response):
        print(response.request.headers.getlist('Set-Cookie'))
        print(response.cookiejar)

This is my settings.py

SPLASH_URL = 'http://127.0.0.1:8050'
CRAWLERA_ENABLED= False
DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
SPIDER_MIDDLEWARES = {'scrapy_splash.SplashDeduplicateArgsMiddleware': 100 }
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'
COOKIES_ENABLED = True
COOKIES_DEBUG = True
SPLASH_COOKIES_DEBUG = True

The result of response.request.headers.getlist('Set-Cookie') is [], and response.cookiejar got an error: AttributeError: 'SplashTextResponse' object has no attribute 'cookiejar'. So how can I get the cookies without causing an error?

Upvotes: 2

Views: 2400

Answers (2)

Franz Gastring
Franz Gastring

Reputation: 1130

Using the LUA script below the response will be a dict with cookies located at key cookies

function main(splash)
      local url = splash.args.url
      assert(splash:go(url))
      assert(splash:wait(0.5))
      return {
        cookies = splash:get_cookies(),
      }
end

So to access you should use

# d = requests.post('splash').json()
print(d['cookies'])

Upvotes: 1

Lucas Wieloch
Lucas Wieloch

Reputation: 818

To access response.cookiejar you need to return SplashJsonResponse

try returning extra fields on your Lua script:

script = '''
        function main(splash)
          local url = splash.args.url
          assert(splash:go(url))
          assert(splash:wait(0.5))
          local entries = splash:history()
          local last_response = entries[#entries].response
          return {
            url = splash:url(),
            headers = last_response.headers,
            http_status = last_response.status,
            cookies = splash:get_cookies(),
            html = splash:html(),
          }
        end
        '''

Upvotes: 2

Related Questions