Jake
Jake

Reputation: 330

Python HTTP cookie jar implementation in aiohttp

I'm trying to implement the logic from the piece of code below which does requests to google search using aiohttp, my solution seems to be equivalent but for some reason does not set cookies as desired. Any help?

from http.cookiejar import LWPCookieJar
from urllib.request import Request, urlopen

USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)'
cookie_jar = LWPCookieJar(os.path.join(home_folder, '.google-cookie'))
cookie_jar.load()


def get_page(url, user_agent=None, verify_ssl=True):
    if user_agent is None:
        user_agent = USER_AGENT
    request = Request(url)
    request.add_header('User-Agent', user_agent)
    cookie_jar.add_cookie_header(request)
    response = urlopen(request)
    cookie_jar.extract_cookies(response, request)
    html = response.read()
    response.close()
    try:
        cookie_jar.save()
    except Exception:
        pass
    return html

My solution:

import aiohttp

USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)'
abs_cookie_jar = aiohttp.CookieJar()
abs_cookie_jar.load('.aiogoogle-cookie')


async def get_page(url, user_agent=None, verify_ssl=True):
    if user_agent is None:
        user_agent = USER_AGENT
    async with aiohttp.ClientSession(headers={'User-Agent': user_agent}, cookie_jar=abs_cookie_jar) as session:
        response = await session.get(url)
        if response.cookies:
            abs_cookie_jar.update_cookies(cookies=response.cookies)
            abs_cookie_jar.save('.aiogoogle-cookie')
        html = await response.text()
    return html

Upvotes: 1

Views: 2848

Answers (1)

bazko1
bazko1

Reputation: 31

What happens is when you head to google.com you are getting redirected. As a result, 3 HTTP requests are performed with response codes 301, 302, 200 (You can display them by accessing response.history attribute).

The Set-Cookie header is added to the first response, but what you have in response variable is the last one, which does not contain cookies.

The update part in your implementation: abs_cookie_jar.update_cookies(cookies=response.cookies) is not needed as aiohttp does that automatically for all requests see source.

How your solution could be fixed:

import aiohttp, asyncio

USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)'
abs_cookie_jar = aiohttp.CookieJar()
abs_cookie_jar.load('.aiogoogle-cookie')

async def get_page(url, user_agent=None, verify_ssl=True):
    if user_agent is None:
        user_agent = USER_AGENT
    async with aiohttp.ClientSession(headers={'User-Agent': user_agent}, cookie_jar=abs_cookie_jar) as session:
        response = await session.get(url)

        html = await response.text()

        # display redirect responses
        for resp in response.history:
            print(resp)

        # print cookies for human readable format
        for cookie in abs_cookie_jar:
            print(cookie)

        # save jar which already have response cookies
        abs_cookie_jar.save('.aiogoogle-cookie')

    return html

loop = asyncio.get_event_loop()

loop.run_until_complete(get_page('https://google.com'))

Upvotes: 2

Related Questions