JonasUJ
JonasUJ

Reputation: 108

Python requests 422 error on post

I've been trying to scrape a website like GitHub that requires login authentication, but unlike Github, it does not have and an API. I've followed these instructions and many others, but nothing seems to work and just returns a 422 error.

from lxml import html

url = "https://github.com/login"
user = "my email"
pas = "associated password"

sess = requests.Session()
r = sess.get(url)

rhtml = html.fromstring(r.text)

#get all hidden input fields and make a dict of them
hidden = rhtml.xpath(r'//form//input[@type="hidden"]')
form = {x.attrib["name"]: x.attrib["value"] for x in hidden}

#add login creds to the dict
form['login'] = user
form['password'] = pas

#post
res = sess.post(url, data=form)

print(res)
# <Response [422]>

I've also tried just sess.post(url, data={'login':user, 'password':pas}) with the same result. geting the cookies first and using them in the post doesn't seem to work either.

How can i get my login page, preferably without using Selenium?

Upvotes: 3

Views: 1495

Answers (1)

drec4s
drec4s

Reputation: 8077

That's because the form action is different from the login page.

This is how you can do it using requests and BeautifulSoup:

import requests
from bs4 import BeautifulSoup

url = "https://github.com/login"
user = "<username>"
pwd = "<password>"

with requests.Session() as s:

    r = s.get(url)
    soup = BeautifulSoup(r.content, "lxml")

    hidden = soup.find_all("input", {'type':'hidden'})
    target = "https://github.com" + soup.find("form")['action']
    payload = {x["name"]: x["value"] for x in hidden}

    #add login creds to the dict
    payload['login'] = user
    payload['password'] = pwd

    r = s.post(target, data=payload)
    print(r)

Upvotes: 2

Related Questions