geuristic
geuristic

Reputation: 13

What is the difference between BeautifulSoup's site.content and site.read()?

When I use a local html file stored on my laptop,

from bs4 import BeautifulSoup
site = open('smpl.htm', 'r')
page = BeautifulSoup(site.content, 'html.parser')
print(page)

returns (in the cmd):

Traceback (most recent call last):
File "c:/~~~~~~/python/h.py", line 3, in <module>
page = BeautifulSoup(site.content, 'html.parser')
AttributeError: '_io.TextIOWrapper' object has no attribute 'content'

but by replacing site.content with site.read(), the code shows the correct HTML and performs operations on it without any problems.

However, if I get my HTML file from the web through requests, then I'll have to write site.content and not site.read() to parse it.

What is the difference between content and read() and which is appropriate for what?

Upvotes: 0

Views: 194

Answers (1)

William McEnaney
William McEnaney

Reputation: 48

Opening a html file on your laptop returns a TextIOWrapper which has a read() method to get the contents of the file.

Opening a web page uses a different class with different methods - the one you reference looks to return some form of HttpResponse object with a contents string parameter.

Upvotes: 1

Related Questions