Alexandre Vandermonde
Alexandre Vandermonde

Reputation: 579

Trouble with requests/Beautiful soup

I'm trying to learn to use som web features of Python, and thought I'd practice by writing a script to login to a webpage at my university. Initially I wrote the code using urllib2, but user alecxe kindly provided me with a code using requests/BeautifulSoup (please see:Website form login using Python urllib2)

I am trying to login to the page http://reg.maths.lth.se/. The page features one login form for students and one for teachers (I am obviously trying to log in as a student). To login one should provide a "Personnummer" which is basically the equivalent of a social security number, so I don't want to post my valid number. However, I can reveal that it should be 10 digits long.

The code I was provided (with a small change to the final print statement) is given below:

import requests
from bs4 import BeautifulSoup

PNR = "00000000"

url = "http://reg.maths.lth.se/"
login_url = "http://reg.maths.lth.se/login/student"
with requests.Session() as session:
    # extract token
    response = session.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    token = soup.find("input", {"name": "_token"})["value"]

    # submit form
    session.post(login_url, data={
        "_token": token,
        "pnr": PNR
    })

    # navigate to the main page again (should be logged in)
    #response = session.get(url) ##This is deliberately commented out

    soup = BeautifulSoup(response.content, "html.parser")
    print(soup)

It is thus supposed to print the source code of the page obtained after POSTing the pnr.

While the code runs, it always returns the source code of the main page http://reg.maths.lth.se/ which is not correct. For example, if you try to manually enter a pnr of the wrong length, i.e. 0, you should be directed to a page which looks like this:

enter image description here located at the url http://reg.maths.lth.se/login/student whose source code is obiously different from that of the main page.

Any suggestions?

Upvotes: 2

Views: 786

Answers (1)

AKX
AKX

Reputation: 168893

You aren't assigning the POST result to response, and are just printing out the result of the first GET request.

So,

# submit form
session.post(login_url, data={
    "_token": token,
    "pnr": PNR
})

should be

response = session.post(login_url, data={
    "_token": token,
    "pnr": PNR
})

Upvotes: 3

Related Questions