Reputation: 111
Usually I've been able to get around 403 Errors once I've added a known User Agent but I'm now trying to login and then eventually scrape and cannot figure out how to bypass this error.
Code:
import urllib
import http.cookiejar
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
authentication_url = 'https://www.linkedin.com/'
payload = {
'session_key': 'email',
'session_password': 'password'
}
data = urllib.parse.urlencode(payload)
binary_data = data.encode('UTF-8')
req = urllib.request.Request(authentication_url, binary_data)
resp = urllib.request.urlopen(req)
contents = resp.read()
Traceback:
Traceback (most recent call last):
File "C:/Python34/loginLinked.py", line 16, in <module>
resp = urllib.request.urlopen(req)
File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
return opener.open(url, data, timeout)
File "C:\Python34\lib\urllib\request.py", line 469, in open
response = meth(req, response)
File "C:\Python34\lib\urllib\request.py", line 579, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python34\lib\urllib\request.py", line 507, in error
return self._call_chain(*args)
File "C:\Python34\lib\urllib\request.py", line 441, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 587, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Upvotes: 0
Views: 1760
Reputation: 2576
See my answer to this question:
why isn't Requests not signing into a website correctly?
I should start with stating that you really should use their API: http://developer.linkedin.com/apis
There does not seem to be any POST login on the frontpage of linkedin using those parameters?
This is the login URL you must POST to: https://www.linkedin.com/uas/login-submit
Be aware that this probably wont work either, as you need at least the csrfToken parameter from the login form.
You probably need the loginCsrfParam too, also from the login form on the frontpage.
Something like this might work. Not tested, you might need to add the other POST parameters.
import requests
s = requests.session()
def get_csrf_tokens():
url = "https://www.linkedin.com/"
req = s.get(url).text
csrf_token = req.split('name="csrfToken" value=')[1].split('" id="')[0]
login_csrf_token = req.split('name="loginCsrfParam" value="')[1].split('" id="')[0]
return csrf_token, login_csrf_token
def login(username, password):
url = "https://www.linkedin.com/uas/login-submit"
csrfToken, loginCsrfParam = get_csrf_tokens()
data = {
'session_key': username,
'session_password': password,
'csrfToken': csrfToken,
'loginCsrfParam': loginCsrfParams
}
req = s.post(url, data=data)
login('username', 'password')
Upvotes: 1