Reputation: 11
I am able to scrape data using Beautifulsoup and now looking to generate a file containing all the data from which I scraped using Beautiful Soup.
file = open("copy.txt", "w")
data = soup.get_text()
data
file.write(soup.get_text())
file.close()
I don't see all the tag and entire content in the text file. Any Idea on how to achieve it?
Upvotes: 1
Views: 4351
Reputation: 17322
you can use:
with open("copy.txt", "w") as file:
file.write(str(soup))
if you have a list of URLs that will be scraped and then you want to store each URL scraped in a different file, you can try:
my_urls = [url_1, url_2, ..., url_n]
for index, url in enumerate(my_urls):
# .............
# some code to scrape
with open(f"scraped_{index}.txt", "w") as file:
file.write(str(soup))
Upvotes: 1
Reputation: 614
Quick solution:
You need to just convert the soup to string. Using a test site, in case others wish to follow:
from bs4 import BeautifulSoup as BS
import requests
r = requests.get("https://webscraper.io/test-sites/e-commerce/allinone")
soup = BS(r.content)
file = open("copy.txt", "w")
file.write(str(soup))
file.close()
Slightly better solution:
It's better practice to use a context for your file IO (use with
):
from bs4 import BeautifulSoup as BS
import requests
r = requests.get("https://webscraper.io/test-sites/e-commerce/allinone")
soup = BS(r.content)
with open("copy.txt", "w") as file:
file.write(str(soup))
Upvotes: 0