Reputation: 461
I used the following Python code to retrieve a web page behind a login page successfully for some years:
username = 'user'
password = 'pass'
login_url = 'https://company.com/login?url='
redirect_url = 'https://epaper.company.com/'
data = { 'email' : username, 'pass' : password }
initial_url = login_url + quote(redirect_url)
response = requests.post(initial_url, data=data)
Then something changed at company.com about 2 months ago, and the request returned status code 400. I tried changing the data parameter to json (response = requests.post(initial_url, json=data)
) which gave me a 200 response telling me a wrong password was provided.
Any ideas what I could try to debug?
Thanks, Jan
Update: I just tried using a requests session to retrieve the csrf_token from the login page (as suggested here), so now my code reads:
with requests.Session() as sess:
response = sess.get(login_url)
signin = BeautifulSoup(response._content, 'html.parser')
data['csrf_token'] = signin.find('input', {'name':'csrf_token'})['value']
response = sess.post(initial_url, data=data)
Unfortunately, the response is still 400 (and 200/wrong password with the json parameter).
Upvotes: 2
Views: 3679
Reputation: 73
First: When you send data=data, used {"Content-Type":"application/x-www-form-urlencoded"}; if you send json=data, in headers response should be used {"Content-Type":"application/json"}
Second: Perhaps redirects have been added. Try to add:
response = sess.post(url, data=data)
print("URL you expect", url)
print("Last request URL:", response.url)
Be sure to check:
print(sess.cookies.get_dict())
print(response.headers)
If you get an unexpected result when checking, change the code like this:
response = sess.post(url, data=data, allow_redirects=False)
Upvotes: 1