Reputation: 31189
In the first step of html5lib
tutorial I see pretty confused behavior.
The docs tells:
import html5lib
f = open("mydocument.html")
doc = html5lib.parse(f)
This will return a tree in a custom "simpletree" format.
As file I have a normal html document. But in my case this is:
<None>
>>> doc is None
False
I believe it is not ok, but I have no idea what happens.
If I calls read
method on opened file it is returns file as string:
f = open("mydocument.html")
f.read()
# returns string with html
And after doc = html5lib.parse(f)
, f.read()
returns empty string, like the file the file was already read.
Upvotes: 0
Views: 249
Reputation: 69082
the <None>
doesn't really mean that your document is not parsed, it just means that you document has no name. if you do
doc.name = "test"
print(doc)
it should show <test>
parse
can also take a string as argument, in which case it will load the file for you, no need to open it yourself.
try print(doc.toxml())
Upvotes: 1