Reputation: 1
def main:
with open(sourcefile, 'r', encoding='utf-8') as main_file:
for line in main_file:
htmlcontent = reader(line)
def reader(line):
with urllib.request.urlopen(line) as url_file:
try:
url_file.read().decode('UTF-8')
except urllib.error.URLError as url_err:
print('Error opening url: ', url, url_err)
except UnicodeDecodeError as decode_err:
print('Error decoding url: ', url, decode_err)
return url_file
Hello everyone, I am pretty new to python and I have a question regarding reading the HTML code from a website. So I am using regular expressions as shown, and I am trying to simply return the HTML code from a website. The variable line
takes in URLs from a text file, which has lines of URL so it iterates through it. This is my code so far, but there are multiple errors that are popping up. I know that I have to use the else
clause, and I don't know how to incorporate that. I intend to use the returned HTML value as a subject for regex. I also hope to get the HTML using urllib.request library.
Upvotes: 0
Views: 55
Reputation: 17408
It's better to use requests module. One liner code
import requests
html = requests.get("www.domain.tld").text
Upvotes: 2
Reputation: 716
This saves the website content in html_content
and prints it
import urllib
url = "www.domain.tld"
seed_url = urllib.urlopen(url)
html_content = seed_url.read()
seed_url.close()
print(html_content)
Upvotes: 0