Jckxxlaw
Jckxxlaw

Reputation: 391

Automatically gather data to POST in Python

I'm writing a script to automatically scrape PDFs from my university Moodle page. The PDFs are only accessible after logging in. I'm using requests (with requests.session) to fill in the login form and POST my login details to generate a cookie so I can access the files.

The issue is, the login form doesn't just take my username and password, but also takes a bunch of weird variables generated when the login page is loaded, including a unique token (these are all invisible to the user). Now, I have successfully logged in from python by extracting these variables using beautiful soup, and adding them to the payload when posting the login form, like so:

username = input("Username: ")
password = input("Password: ")
moodleLoginURL = "https://auth.bath.ac.uk/login"

s = requests.Session()
r = s.get(moodleLoginURL)

soup = bs4.BeautifulSoup(r.text, "html.parser")
token = soup.find('input', {'name' : 'execution'}).get('value')
lt = soup.find('input', {'name' : 'lt'}).get('value')
_eventId = soup.find('input', {'name' : '_eventId'}).get('value')
submit = soup.find('input', {'name' : 'submit'}).get('value')

payload = {"username" : username, "password" : password, "execution" : token, "lt" : lt, "_eventId" : _eventId, "submit" : submit}

s.post(moodleLoginURL, data = payload)

This works, but my issue with it is that is won't work with other websites, and isn't resistant to updates by the website managers. My question is, is there a way to automatically gather the data generated behind the scenes (that is, all of the data to be POSTed except user input data) rather than manually extracting each variable, specific to the given website? This way, I could login to any website that automatically generates tokens etc., with possibly a slight modification needed on the username and password variables. Is this possible?

(If you'd like to look at the HTML for the login form, the site I'm trying to log into is here: https://auth.bath.ac.uk/login)

Upvotes: 0

Views: 134

Answers (1)

Vikas Ojha
Vikas Ojha

Reputation: 6950

Usually all the extra variables are input tags and have type=hidden attribute. So you can do something like this -

payload = {}
for hidden_input_elem in soup.findAll('input', {'type' : 'hidden'}):
    payload[hidden_input_elem.get('name')] = hidden_input_elem.get('value')

and after this add other user input variables in the payload dict.

-Edit: Corrected function 'findAll' capitalisation

Upvotes: 1

Related Questions