danielrhabib
danielrhabib

Reputation: 1

Python post request using aiohttp to log on to website fails but works fine with requests library

Im trying to convert a method that uses requests to now use aiohttp to make async requests. When trying to login to the website with requests, everything works fine, however when converting over to aiohttp, the post request does not return protected page and remains on login page.

It won't let me post orignal function without considering it spam but basically it's the same exact as the method below but uses requests library and no aiohttp or async.

Here is updated function with aiohttp:

    async def asyncGetRequestSession(self, username, password):
        try:
            async with aiohttp.ClientSession() as requestSession:
                async with requestSession.get(
                        "https://hac.friscoisd.org/HomeAccess/Account/LogOn?ReturnUrl=%2fHomeAccess%2f") as loginScreenResponse:
                    loginScreenResponseText = await loginScreenResponse.text()

                    parser = BeautifulSoup(loginScreenResponseText, "lxml")

                    # scrapes verification token from login screen page that is required to authenticate
                    requestVerificationToken = parser.find('input', attrs={'name': '__RequestVerificationToken'})["value"]

                    # post request headers with verification token
                    requestHeaders = {
                        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) '
                                      'Chrome/36.0.1985.125 Safari/537.36',
                        'X-Requested-With': 'XMLHttpRequest',
                        'Host': 'hac.friscoisd.org',
                        'Origin': 'hac.friscoisd.org',
                        'Referer': "https://hac.friscoisd.org/HomeAccess/Account/LogOn?ReturnUrl=%2fhomeaccess%2f",
                        '__RequestVerificationToken': requestVerificationToken
                    }

                    # post request payload with verification token
                    requestPayload = {
                        "__RequestVerificationToken": requestVerificationToken,
                        "SCKTY00328510CustomEnabled": "False",
                        "SCKTY00436568CustomEnabled": "False",
                        "Database": "10",
                        "VerificationOption": "UsernamePassword",
                        "LogOnDetails.UserName": username,
                        "tempUN": "",
                        "tempPW": "",
                        "LogOnDetails.Password": password
                    }

                    async with requestSession.post(
                            "https://hac.friscoisd.org/HomeAccess/Account/LogOn?ReturnUrl=%2fHomeAccess%2f",
                            data=requestPayload,
                            headers=requestHeaders,
                    ) as pageDOM:

                        print(pageDOM.url)
                        # fails to log in and returns original url: https://hac.friscoisd.org/HomeAccess/Account/LogOn?ReturnUrl=/HomeAccess/

                        if pageDOM.url == "https://hac.friscoisd.org/HomeAccess/Account/LogOn?ReturnUrl=%2FHomeAccess%2F":
                            return HTTPException(status_code=400, detail="HAC Login Failed")

                        return requestSession
        except:
            raise HTTPException(status_code=500, detail="HAC Server Error")

My guess is that it has something to do with cookies not being preserved between the earlier get and subsequent post request? That might result in a different requestVerificationToken which would be the reason why it fails since the requestVerificationToken changes every time the webpage is reloaded. However I'm a beginner and not very sure.

I'd be happy to take any advice what may be the issue or how to better diagnose this.

UPDATE: After printing out pageDOM.request_info.headers in the async function and pageDOM.request.headers in the working original function and comparing the two, I found that the updated aiohttp function is missing a .AuthCookie in the cookie.

Here is printing pageDOM.request.headers in working method:

Here is printing pageDOM.request_info.headers in the current broken method:

The only difference is the .AuthCookie in the Cookie. Although I'm not familiar with how cookies work, I think that for some reason during the first get method it fails to set: "Set-Cookie: .AuthCookie=; Domain=hac.friscoisd.org; expires=Tue, 12-Oct-1999 05:00:00 GMT; HttpOnly; Path=/; Secure". I'm just not sure why this is the case or how to fix it.

Upvotes: 0

Views: 483

Answers (2)

BizzyVinci
BizzyVinci

Reputation: 348

The reason is aiohttp url encoding.

Your url is changed from A to B

A: https://hac.friscoisd.org/HomeAccess/Account/LogOn?ReturnUrl=%2fHomeAccess%2f

B: https://hac.friscoisd.org/HomeAccess/Account/LogOn?ReturnUrl=/HomeAccess/

Notice the change from %2f to /

To use the standard encoding you'll use yarl.URL

import yarl
raw_url = 'https://hac.friscoisd.org/HomeAccess/Account/LogOn?ReturnUrl=%2fHomeAccess%2f'
encoded_url = yarl.URL(raw_url, encoded=True)
...
async with requestSession.get(encoded_url) as loginScreenResponse:
...

Another option is to use params argument instead of ? in url

url = 'https://hac.friscoisd.org/HomeAccess/Account/LogOn'
params = {'ReturnUrl': '%2fHomeAccess%2f'}
session.get(url, params=params)

Upvotes: 0

Frank Yellin
Frank Yellin

Reputation: 11297

Your web site is returning two cookies:

Set-Cookie: .AuthCookie=; expires=Tue, 12-Oct-1999 05:00:00 GMT; path=/; secure; HttpOnly
Set-Cookie: SPIHACSiteCode=; path=/; HttpOnly

(Normally, I'd do this sort of response as a comment, but it's hard to put HTTP headers into comments.)

Upvotes: 0

Related Questions