Reputation: 100
I'm trying to download some files from behind a SSO (Single Sign-On) site. It seems to be SAML authenticated, that's where I'm stuck. Once authenticated I'll be able to perform API requests that return JSON, so no need to interpret/scrape.
Not really sure how to deal with that in mechanicalsoup (and relatively unfamiliar with web-programming in general), help would be much appreciated.
Here's what I've got so far:
import mechanicalsoup
from getpass import getpass
import json
login_url = ...
br = mechanicalsoup.StatefulBrowser()
response = br.open(login_url)
if verbose: print(response)
# provide the username + password
br.select_form('form[id="loginForm"]')
print(br.get_current_form().print_summary()) # Just to see what's there.
br['UserName'] = input('Email: ')
br['Password'] = getpass()
response = br.submit_selected().text
if verbose: print(response)
At this point I get a page telling me javascript is disabled and that I must click submit to continue. So I do:
br.select_form()
response = br.submit_selected().text
if verbose: print(response)
That's where I get a complaint about state information being lost.
Output:
<h2>State information lost</h2>
State information lost, and no way to restart the request<h3>Suggestions for resolving this problem:</h3><ul><li>Go back to the previous page and try again.</li><li>Close the web browser, and try again.</li></ul><h3>This error may be caused by:</h3><ul><li>Using the back and forward buttons in the web browser.</li><li>Opened the web browser with tabs saved from the previous session.</li><li>Cookies may be disabled in the web browser.</li></ul>
The only hits I've found on scraping behind SAML logins are all going with a selenium approach (and sometimes dropping down to requests).
Is this possible with mechanicalsoup?
Upvotes: 0
Views: 361
Reputation: 100
My situation turned out to require Javascript for login. My original question about getting into SAML auth was not the true environment. So this question has not truly been answered.
Thanks to @Daniel Hemberger for helping me figure that out in the comments.
In this situation MechanicalSoup is not the correct tool (due to Javascript) and I ended up using selenium to get through authenication then using requests.
Upvotes: 0