tesla john
tesla john

Reputation: 350

How to send requests to a downloaded html file

I have a .html file downloaded and want to send a request to this file to grab it's content.

However, if I do the following:

import requests
html_file  = "/user/some_html.html"
r = requests.get(html_file)

Gives the following error:

Invalid URL 'some_html.html': No schema supplied.

If I add a schema I get the following error:

HTTPConnectionPool(host='some_html.html', port=80): Max retries exceeded with url:

I want to know how to specifically send a request to a html file when it's downloaded.

Upvotes: 0

Views: 3203

Answers (5)

Gul Khetab
Gul Khetab

Reputation: 1

As other users said, the requests.get() method requires http connection to fetch an html file from a website not from a local directory. In order to request an html file saved locally on your computer, use urlopen() function of urllib.request module as follows:

from urllib.request import urlopen

response = urlopen('file:example.html')
byts = response.read()
html = byts.decode()
print(html)

Assuming that example.html is saved in the current working directory. Reading the response will return a bytes object and decoding the bytes object will return the html code of the requested html file.

Upvotes: 0

memr404
memr404

Reputation: 1

You can try this:

from requests_html import HTML
with open("htmlfile.html") as htmlfile:
    sourcecode = htmlfile.read()
    parsedHtml = HTML(html=sourcecode)
print(parsedHtml)

Upvotes: 0

Rogerio Kurek
Rogerio Kurek

Reputation: 56

You can do it by setting up a local server to your html file. If you use Visual Studio Code, you can install Live Server by Ritwick Dey.

Then you do as follows:

1 - Make the first request and save the html content into a .html file:

my_req.py

import requests

file_path = './'
file_name = 'my_file'

url = "https://www.qwant.com/"

response = requests.request("GET", url)

w = open(file_path + file_name + '.html', 'w')
w.write(response.text)

2 - With Live Server installed on Visual Studio Code, click on my_file.html and then click on Go Live.

Go Live

and

3 - Now you can make a request to your local http schema:

second request

import requests

url = "http://127.0.0.1:5500/my_file.html"

response = requests.request("GET", url)

print(response.text)

And, tcharan!! do what you need to do.

On a crawler work, I had one situation where there was a difference between the content displayed on the website and the content retrieved with the response.text so the xpaths did not were the same as on the website, so I needed to download the content, making a local html file, and get the new ones xpaths to get the info that I needed.

Upvotes: 1

Code-Apprentice
Code-Apprentice

Reputation: 83527

You don't "send a request to a html file". Instead, you can send a request to a HTTP server on the internet which will return a response with the contents of a html file.

The file itself knows nothing about "requests". If you have the file stored locally and want to do something with it, then you can open it just like any other file.

If you are interested in learning more about the request and response model, I suggest you try a something like

response = requests.get("http://stackoverflow.com")

You should also read about HTTP and requests and responses to better understand how this works.

Upvotes: 0

M. Twarog
M. Twarog

Reputation: 2623

You are accessing html file from local directory. get() method uses HTTPConnection and port 80 to access data from website not a local directory. To access file from local directory using get() method use Xampp or Wampp. for accessing file from local directory you can use open() while requests.get() is for accessing file from Port 80 using http Connection in simple word from internet not local directory

import requests
html_file  = "/user/some_html.html"
t=open(html_file, "r")
for v in t.readlines():
  print(v)

Output: enter image description here

Upvotes: 1

Related Questions