What is the difference between BeautifulSoup's site.content and site.read()?

Question

When I use a local html file stored on my laptop,

from bs4 import BeautifulSoup
site = open('smpl.htm', 'r')
page = BeautifulSoup(site.content, 'html.parser')
print(page)

returns (in the cmd):

Traceback (most recent call last):
File "c:/~~~~~~/python/h.py", line 3, in 
page = BeautifulSoup(site.content, 'html.parser')
AttributeError: '_io.TextIOWrapper' object has no attribute 'content'

but by replacing site.content with site.read(), the code shows the correct HTML and performs operations on it without any problems.

However, if I get my HTML file from the web through requests, then I'll have to write site.content and not site.read() to parse it.

What is the difference between content and read() and which is appropriate for what?

William McEnaney · Accepted Answer

Opening a html file on your laptop returns a TextIOWrapper which has a read() method to get the contents of the file.

Opening a web page uses a different class with different methods - the one you reference looks to return some form of HttpResponse object with a contents string parameter.

What is the difference between BeautifulSoup's site.content and site.read()?

Answers (1)

Related Questions

What is the difference between BeautifulSoup&#39;s site.content and site.read()?

Answers (1)

Related Questions

What is the difference between BeautifulSoup's site.content and site.read()?