Reputation:
I'm using Beautiful Soup 4 to extract text from HTML files, and using get_text()
I can easily extract just the text, but now I'm attempting to write that text to a plain text file, and when I do, I get the message "416." Here's the code I'm using:
from bs4 import BeautifulSoup
markup = open("example1.html")
soup = BeautifulSoup(markup)
f = open("example.txt", "w")
f.write(soup.get_text())
And the output to the console is 416
but nothing gets written to the text file. Where have I gone wrong?
Upvotes: 2
Views: 7808
Reputation: 20343
You need to send text to the BeautifulSoup
class. Maybe try markup.read()
from bs4 import BeautifulSoup
markup = open("example1.html")
soup = BeautifulSoup(markup.read())
markup.close()
f = open("example.txt", "w")
f.write(soup.get_text())
f.close()
and in a more pythonic style
from bs4 import BeautifulSoup
with open("example1.html") as markup:
soup = BeautifulSoup(markup.read())
with open("example.txt", "w") as f:
f.write(soup.get_text())
as @bernie suggested
Upvotes: 5