user1306802
user1306802

Reputation: 71

Save HTML Source Code to File

How can I copy the source code of a website into a text file in Python 3?

EDIT: To clarify my issue, here's what I have:

import urllib.request

def extractHTML(url):
    f = open('temphtml.txt', 'w')
    page = urllib.request.urlopen(url)
    pagetext = page.read()
    f.write(pagetext)
    f.close()

extractHTML('http:www.google.com')

I get the following error for the f.write() function:

builtins.TypeError: must be str, not bytes

Upvotes: 7

Views: 11561

Answers (3)

Xantium
Xantium

Reputation: 11605

Try this.

import urllib.request
def extractHTML(url):
    urllib.request.urlretrieve(url, 'temphtml.txt')

It is easier, but if you still want to do it that way. This is the solution:

import urllib.request

def extractHTML(url):
    f = open('temphtml.txt', 'w')
    page = urllib.request.urlopen(url)
    pagetext = str(page.read())
    f.write(pagetext)
    f.close()

extractHTML('https://www.google.com')

Your script gave an error saying it must be a string. Just convert bytes to a string with str().

Next I got an error saying no host was given. Google is a secured site so https: not http: and most importantly you forgot to include // at the end of https:.

Upvotes: 1

user3105498
user3105498

Reputation: 11

probably you wanted to create something like that:

import urllib.request

class ExtractHtml():

    def Page(self):

        print("enter the web page name starting with 'http://': ")
        url=input()
        site=urllib.request.urlopen(url)
        data=site.read()
        file =open("D://python_projects/output.txt", "wb")
        file.write(data)
        file.close()






w=ExtractHtml()
w.Page()

Upvotes: 0

Jack
Jack

Reputation: 760

import urllib.request
site = urllib.request.urlopen('http://somesite.com')
data = site.read()
file = open("file.txt","wb") #open file in binary mode
file.writelines(data)
file.close()

Untested but should work.

EDIT: Updated for python3

Upvotes: 3

Related Questions