Reputation: 1
How can I scrape multiple pages from a website? This code is only working for the first one:
import csv
import requests
from bs4 import BeautifulSoup
import datetime
filename = "azet_" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M")+".csv"
with open(filename, "w+") as f:
writer = csv.writer(f)
writer.writerow(["Descriere","Pret","Data"])
r = requests.get("https://azetshop.ro/12-extensa?page=1")
soup = BeautifulSoup(r.text, "html.parser")
x = soup.find_all("div", "thumbnail")
for thumbnail in x:
descriere = thumbnail.find("h3").text.strip()
pret = thumbnail.find("price").text.strip()
writer.writerow([descriere, pret, datetime.datetime.now()])
Upvotes: 0
Views: 4388
Reputation: 363
This code works fine too to use class attribute with bs4:
import csv
import requests
from bs4 import BeautifulSoup
import datetime
filename = "azet_" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M")+".csv"
with open(filename, "w+") as f:
writer = csv.writer(f)
writer.writerow(["Descriere","Pret","Data"])
for i in range(1,50):
r = requests.get("https://azetshop.ro/12-extensa?page="+format(i))
soup = BeautifulSoup(r.text, "html.parser")
array_price= soup.find_all('span', class_='price')
array_desc=soup.find_all('h1', class_='h3 product-title',text=True)
for iterator in range(0,len(array_price)):
descriere = array_desc[iterator].text.strip()
pret = array_price[iterator].text.strip()
writer.writerow([descriere, pret, datetime.datetime.now()])
Upvotes: 0
Reputation: 1131
For multiple pages scraping using BeautifulSoup
, many usually do it using while
:
import csv
import requests
from bs4 import BeautifulSoup
import datetime
end_page_num = 50
filename = "azet_" + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M")+".csv"
with open(filename, "w+") as f:
writer = csv.writer(f)
writer.writerow(["Descriere","Pret","Data"])
i = 1
while i <= end_page_num:
r = requests.get("https://azetshop.ro/12-extensa?page={}".format(i))
soup = BeautifulSoup(r.text, "html5lib")
x = soup.find_all("div", {'class': 'thumbnail-container'})
for thumbnail in x:
descriere = thumbnail.find('h1', {"class": "h3 product-title"}).text.strip()
pret = thumbnail.find('span', {"class": "price"}).text.strip()
writer.writerow([descriere, pret, datetime.datetime.now()])
i += 1
Here i
will change with increment of 1
as scraping of a page is completed.
This will continue scraping till end_page_num
you have defined.
Upvotes: 2