Reputation: 129
Is there a way to save my ResultSet object of BeautifulSoup to a file, then read the file and be able to use commands such as find_all
?
For example, my code is
import requests
from bs4 import BeautifulSoup
#scraping
website_link = 'https://stackoverflow.com/'
request1 = requests.get(website_link)
source1 = request1.content
soup1 = BeautifulSoup(source1, 'lxml')
#saving
savefilename = 'question.txt'
with open(savefilename, "w", encoding="utf-8") as f:
f.write(str(soup1))
f.close()
In the step f.write(str(soup1))
, I am basically converting this ResultSet object of bs4.element into string for saving which is crucial, I have not found a way around this. Once it is converted into a string, is there a way to convert back to ResultSet object of BeautifulSoup that would allow me to use .find_all()
and similar commands again?
Upvotes: 2
Views: 667
Reputation: 20038
Just create another BeautifulSoup
object:
import requests
from bs4 import BeautifulSoup
#scraping
website_link = 'https://stackoverflow.com/'
request1 = requests.get(website_link)
source1 = request1.content
soup1 = BeautifulSoup(source1, 'html.parser')
#saving
savefilename = 'question.txt'
with open(savefilename, "w", encoding="utf-8") as f:
f.write(str(soup1))
# Open the saved file
with open(savefilename, "r", encoding="utf-8") as f:
soup2 = BeautifulSoup(str(f.readlines()), "html.parser")
>>> print(type(soup2))
class 'bs4.BeautifulSoup'>
Upvotes: 2