Reputation: 13
When I use a local html file stored on my laptop,
from bs4 import BeautifulSoup
site = open('smpl.htm', 'r')
page = BeautifulSoup(site.content, 'html.parser')
print(page)
returns (in the cmd):
Traceback (most recent call last):
File "c:/~~~~~~/python/h.py", line 3, in <module>
page = BeautifulSoup(site.content, 'html.parser')
AttributeError: '_io.TextIOWrapper' object has no attribute 'content'
but by replacing site.content
with site.read()
, the code shows the correct HTML and performs operations on it without any problems.
However, if I get my HTML file from the web through requests
, then I'll have to write site.content
and not site.read()
to parse it.
What is the difference between content and read() and which is appropriate for what?
Upvotes: 0
Views: 194
Reputation: 48
Opening a html file on your laptop returns a TextIOWrapper which has a read() method to get the contents of the file.
Opening a web page uses a different class with different methods - the one you reference looks to return some form of HttpResponse object with a contents string parameter.
Upvotes: 1