Reputation: 850
My goal to create an authenticated session in github so I can use the advanced search (which limits functionality to non-authenticated users). Currently I am getting a webpage response from the post request of "What? Your browser did something unexpected. Please contact us if the problem persists."
Here is the code I am using to try to accomplish my task.
import requests
from lxml import html
s = requests.Session()
payload = (username, password)
_ = s.get('https://www.github.com/login')
p = s.post('https://www.github.com/login', auth=payload)
url = "https://github.com/search?l=&p=0&q=language%3APython+extension%3A.py+sklearn&ref=advsearch&type=Code"
r = s.get(url, auth=payload)
text = r.text
tree = html.fromstring(text)
Is what I'm trying possible? I would prefer to not use the github v3 api since it is rate limited and I wanted to do more of my own scraping of the advanced search. Thanks.
Upvotes: 1
Views: 1740
Reputation: 15376
As mentioned in the comments, github uses post data for authentication so you should have your creds in the data
parameter.
The elements you have to submit are 'login'
, 'password'
, and 'authenticity_token'
. The value of 'authenticity_token'
is dynamic, but you can scrape it from '/login'
.
Finally submit data
to /session
and you should have an authenticated session.
s = requests.Session()
r = s.get('https://www.github.com/login')
tree = html.fromstring(r.content)
data = {i.get('name'):i.get('value') for i in tree.cssselect('input')}
data['login'] = username
data['password'] = password
r = s.post('https://github.com/session', data=data)
Upvotes: 2