Siddharth Chabra
Siddharth Chabra

Reputation: 488

Pythons requests.session cookie retrieval issue

I am trying to scrape data from a website and seem to have a issue getting cookies from the website when I use requests.session. Better explained by the code below

import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36'}
url = "https://www.nseindia.com"
r_without_headers = requests.get(url)
print("response code",r_without_headers.status_code)
print("Resp no header cookies ",r_without_headers.cookies.get_dict())
r_with_headers = requests.get(url,headers = headers)
print("response code",r_with_headers.status_code)
print("Resp with header cookies ",r_with_headers.cookies.get_dict())

s1 = requests.session()
s1_req = s1.get(url)
print("response code",s1_req.status_code)
print("Session no header Cookies ", s1.cookies.get_dict())
print("Session no header Response Cookies", s1_req.cookies.get_dict())

s2 = requests.session()
s2.headers = headers
s2_req = s2.get(url)
print("response code",s2_req.status_code)
print("Session with header Cookies ", s2.cookies.get_dict())
print("Session with header Response Cookies", s2_req.cookies.get_dict())

Output

response code 200
Req no header cookies  {}
response code 200
Req with header cookies  {'ak_bmsc': 'F4040D045001A7CD57BBC58C09C9117F174C9D8E21750000240B665BDDE23467~plhUo272BWU9CTPiQAEJgiZ07qX/BOE0n6iOU8y9pewbmXipo8de1YROpMw6AEtjQDgdt3x+M/2QDATjSAtaRiDVlsDGZfohfsymElg0Xpq0Uta3OYSOSe2B48eg2lJD0CMios+0eqatEro6XvEkYAy+4D14EUHAE/eRp5oVUOpVL6JR8WMNNFoE6Xo7xYQtfLFu8hS1sUNABrYkr6XNFGY3YnkZmawa7imZswMI4tICc='}
response code 200
Session no header Cookies  {}
Session no header Request Cookies {}
response code 200
Session with header Cookies  {}
Session with header Request Cookies {}

The ISSUE

The website clearly needs the User-Agent set to provide a cookie, so when I make a get request with the user agent set I get the expected cookie and without the user-agent set I do not.

When I try the same with requests.session I don't get a response cookie both with and without the header?

The Question

Why is this happening? Am I using sessions incorrectly or is the website broken? (I would not be surprised if this is the case)

How do I get the cookies using sessions?

My current thought process for a work around is I send a simple get request retrieve the cookies and set the cookies in the session manually . But this does not seem correct and if the cookie get modified on subsequent requests there is no guarantee the session will update the cookie as the original session was not able to retrieve the cookie in the first place. I would rather not have the write my entire code using naked requests and manually transfer cookies to subsequent requests.

Upvotes: 0

Views: 2322

Answers (1)

AKX
AKX

Reputation: 168913

The problem is you're overwriting the session's headers object (which under the hood is not an actual dict) with a dict of your own.

Instead just update it:

s2.headers.update(headers)

E.g.

import requests
url = "https://www.nseindia.com"
s2 = requests.session()
s2.headers.update({'User-Agent': 'Mozilla/5.0'})
s2_req = s2.get(url)
print("Session with header Cookies ", s2.cookies.keys())

happily outputs

Session with header Cookies  ['ak_bmsc']

Upvotes: 1

Related Questions