sbru
sbru

Reputation: 887

Why am I losing functionality of a web page after using Python Requests module?

I am trying to download a file from a webpage, but I have to login first. I am using the Python Requests module and I think I'm doing it right as when I print the HTML of the GET response it's all there. However none of the styling is present and none of the links work when I open it in a webpage. My code is below with 'username' and 'password' being a string of my actual username and password.

import requests

f = open('a.html', 'w')
loginurl = 'https://www.example.com/login'
username = 'username'
password = 'password'
url = 'https://www.example.com/secured_page_containing_file'

payload = {
    'UserName' : username,
    'Password' : password
}

with requests.Session() as s:
    s.post(loginurl, data=payload)

    r = s.get(url)
    f.write(r.text)

Again, this works in terms of me extracting the HTML of 'https://www.example.com/secured_page_containing_file' but the functionality isn't there. Any help is greatly appreciated. Thanks!

Upvotes: 0

Views: 58

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1123410

If you are looking at the file with your browser, you are now using a different location for the file. Any relative URLs will be resolved relative to that new location, and none of those URLs will work. You'd have to rewrite those URLs to be absolute URLs for this to work at all.

This is quite apart from the fact that webservers can alter their response based on any number of factors, including what headers you sent when requesting the page, and the page can alter behaviour when JavaScript code associated with the page is executed by your browser.

All this has nothing to do with requests or Python, really.

Upvotes: 1

Related Questions