Reputation: 330
I'm trying to implement the logic from the piece of code below which does requests to google search using aiohttp, my solution seems to be equivalent but for some reason does not set cookies as desired. Any help?
from http.cookiejar import LWPCookieJar
from urllib.request import Request, urlopen
USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)'
cookie_jar = LWPCookieJar(os.path.join(home_folder, '.google-cookie'))
cookie_jar.load()
def get_page(url, user_agent=None, verify_ssl=True):
if user_agent is None:
user_agent = USER_AGENT
request = Request(url)
request.add_header('User-Agent', user_agent)
cookie_jar.add_cookie_header(request)
response = urlopen(request)
cookie_jar.extract_cookies(response, request)
html = response.read()
response.close()
try:
cookie_jar.save()
except Exception:
pass
return html
My solution:
import aiohttp
USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)'
abs_cookie_jar = aiohttp.CookieJar()
abs_cookie_jar.load('.aiogoogle-cookie')
async def get_page(url, user_agent=None, verify_ssl=True):
if user_agent is None:
user_agent = USER_AGENT
async with aiohttp.ClientSession(headers={'User-Agent': user_agent}, cookie_jar=abs_cookie_jar) as session:
response = await session.get(url)
if response.cookies:
abs_cookie_jar.update_cookies(cookies=response.cookies)
abs_cookie_jar.save('.aiogoogle-cookie')
html = await response.text()
return html
Upvotes: 1
Views: 2848
Reputation: 31
What happens is when you head to google.com
you are getting redirected. As a result, 3 HTTP requests are performed with response codes 301, 302, 200 (You can display them by accessing response.history
attribute).
The Set-Cookie
header is added to the first response, but what you have in response
variable is the last one, which does not contain cookies.
The update part in your implementation:
abs_cookie_jar.update_cookies(cookies=response.cookies)
is not needed as aiohttp does that automatically for all requests see source.
How your solution could be fixed:
import aiohttp, asyncio
USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)'
abs_cookie_jar = aiohttp.CookieJar()
abs_cookie_jar.load('.aiogoogle-cookie')
async def get_page(url, user_agent=None, verify_ssl=True):
if user_agent is None:
user_agent = USER_AGENT
async with aiohttp.ClientSession(headers={'User-Agent': user_agent}, cookie_jar=abs_cookie_jar) as session:
response = await session.get(url)
html = await response.text()
# display redirect responses
for resp in response.history:
print(resp)
# print cookies for human readable format
for cookie in abs_cookie_jar:
print(cookie)
# save jar which already have response cookies
abs_cookie_jar.save('.aiogoogle-cookie')
return html
loop = asyncio.get_event_loop()
loop.run_until_complete(get_page('https://google.com'))
Upvotes: 2