Reputation: 616
I work in digital marketing as a data analyst. My department uses third parties to help bring in more customers. Each of these third parties have a website where they show how many customers they have brought into our company. Part of my job is collecting numbers from each website and putting them into a report which is a long and manual process. So far I've been successful in logging into some of our third party websites and extracting some data. However, there is one website I'm having some trouble logging into... https://inspire.flg360.co.uk/SignIn.php. I also need to redirect the session to another URL to scrape the data from.
I have written some code that has been successful in logging into a different website that I need information from.
import requests
from bs4 import BeautifulSoup
import re
username = 'username'
password = 'password'
scrape_url = 'https://portal.mvfglobal.com/index.php/dashboard'
login_url = 'https://portal.mvfglobal.com/index.php/login/login'
login_info = {'login_name': username, 'login_pass': password}
#Start session.
session = requests.session()
#Login using your authentication information.
session.post(url=login_url, data=login_info)
#Request page you want to scrape.
url = session.get(url=scrape_url)
soup = BeautifulSoup(url.content, 'html.parser')
print(soup)
However, when I try to log into https://inspire.flg360.co.uk/SignIn.php using the same methodology I encounter some problems.
import requests
from bs4 import BeautifulSoup
username = 'username'
password = 'password'
login_url = 'https://inspire.flg360.co.uk/SignIn.php'
login_info = {'strEmail': username, 'strPassword': password}
scrape_url = 'https://inspire.flg360.co.uk/AuthUser.php'
#Start session.
session = requests.session()
#Login using your authentication information.
session.post(url=login_url, data=login_info)
#Request page you want to scrape.
url = session.get(url=scrape_url)
soup = BeautifulSoup(url.content, 'html.parser')
print(soup)
When I inspect the element of the page I noticed that the 302 response redirects to https://inspire.flg360.co.uk/AuthUser.php. However when I try to log into this using the code above I still get errors.
I'm completely stumped any ideas?
Final Code Below________________________________________________________
import requests
from bs4 import BeautifulSoup
import hashlib
username = 'username'
password = 'password'
login_url = 'https://inspire.flg360.co.uk/AuthUser.php'
login_info = {"strForwardURL": "",
"strEmail": username,
"intRememberMe": 1,
"strResponse": ""}
scrape_url = 'https://inspire.flg360.co.uk/ma/index.php'
# Start session.
session = requests.session()
# Get strResponse
strc = session.get(url=login_url)
strc = BeautifulSoup(strc.content, 'html.parser').findAll(attrs={"name": "strChallenge"})[0]['value']
strc_joined = strc + hashlib.md5(password.encode("utf-8")).hexdigest()
strresponse = hashlib.md5(strc_joined.encode("utf-8")).hexdigest()
login_info['strResponse'] = strresponse
#Login using your authentication information.
session.post(url=login_url, data=login_info)
# Request page you want to scrape.
url = session.get(url=scrape_url)
soup = BeautifulSoup(url.content, 'html.parser')
print(soup)
Upvotes: 3
Views: 506
Reputation: 311
it looks like the actual POST request sent by the page at https://inspire.flg360.co.uk/SignIn.php has a few more elements that are required. Namely, the POST data actually looks something like:
strForwardURL=&strEmail=abc%40123.com&intRememberMe=1&strResponse=fdb4c46c5d0eeab6133be193afc7897e
The fields are strForwardURL
, strEmail
, intRememberMe
, and strResponse
. Looking at the rest of the code on the page, when you click the submit button, it triggers this bit of javascript on the page:
function fncSignIn() {
var loginForm = document.getElementById("signinForm");
if (loginForm.strEmail.value == "") {
alert("Please enter your email address.");
return false;
}
if (loginForm.strPassword.value == "") {
alert("Please enter your password.");
return false;
}
var submitForm = document.getElementById("submitForm");
submitForm.strEmail.value = loginForm.strEmail.value;
if (loginForm.intRememberMe.checked) submitForm.intRememberMe.value = 1;
submitForm.strResponse.value = hex_md5(loginForm.strChallenge.value+hex_md5(loginForm.strPassword.value));
submitForm.submit();
}
Elsewhere on the page, you can find strChallenge
string here:
<input type="hidden" name="strChallenge" value="1d989603e448a1a0559f08bdc83a15522fbc6c0404ca66acc4cdd7aafe4039359e2fb23b706d60a3">
(this value changes on reload, by the way)
Essentially, instead of the password in string form, it's asking for the md5 hex digest of the strChallenge
string joined with the md5 hex digest of the password.
In python, it would be something like this:
import hashlib
password = "abcdefg12345"
strc = "1d989603e448a1a0559f08bdc83a15522fbc6c0404ca66acc4cdd7aafe4039359e2fb23b706d60a3"
strc_joined = strc + hashlib.md5(password.encode("utf-8")).hexdigest()
strresponse = hashlib.md5(strc_joined.encode("utf-8")).hexdigest()
print(strresponse)
And the output in this example would be 0d289f39067a25430d4818fe38046372
Make the postdata in your original request into:
{"strForwardURL":"", "strEmail":"[email protected]", "intRememberMe": 1, "strResponse": "0d289f39067a25430d4818fe38046372"}
and you should be able to log in. Every time you want to scrape a page that requires this particular login, you should be able to simply grab the strChallenge
with BeautifulSoup4, calculate the proper strResponse
, and log in.
Upvotes: 4