Reputation: 469
I wrote a python code for web scraping so that I can import the data from flipkart.
I need to load multiple pages so that I can import many products but right now only 1 product page is coming.
from urllib.request import urlopen as uReq
from requests import get
from bs4 import BeautifulSoup as soup
import tablib
my_url = 'https://www.xxxxxx.com/food-processors/pr?sid=j9e%2Cm38%2Crj3&page=1'
uClient2 = uReq(my_url)
page_html = uClient2.read()
uClient2.close()
page_soup = soup(page_html, "html.parser")
containers11 = page_soup.findAll("div",{"class":"_3O0U0u"})
filename = "FoodProcessor.csv"
f = open(filename, "w", encoding='utf-8-sig')
headers = "Product, Price, Description \n"
f.write(headers)
for container in containers11:
title_container = container.findAll("div",{"class":"_3wU53n"})
product_name = title_container[0].text
price_con = container.findAll("div",{"class":"_1vC4OE _2rQ-NK"})
price = price_con[0].text
description_container = container.findAll("ul",{"class":"vFw0gD"})
product_description = description_container[0].text
print("Product: " + product_name)
print("Price: " + price)
print("Description" + product_description)
f.write(product_name + "," + price.replace(",","") +"," + product_description +"\n")
f.close()
Upvotes: 0
Views: 251
Reputation: 118
For me, the easiest way is to add an extra loop with the "page" variable:
# just check the number of the last page on the website
page = 1
while page != 10:
print(f'Scraping page: {page}')
my_url = 'https://www.xxxxxx.com/food-processors/pr?sid=j9e%2Cm38%2Crj3&page={page}'
# here add the for loop you already have
page += 1
This method should work.
Upvotes: 0
Reputation: 469
try:
next_btn = driver.find_element_by_xpath("//a//span[text()='Next']")
next_btn.click()
except ElementClickInterceptedException as ec:
classes = "_3ighFh"
overlay = driver.find_element_by_xpath("(//div[@class='{}'])[last()]".format(classes))
driver.execute_script("arguments[0].style.visibility = 'hidden'",overlay)
next_btn = driver.find_element_by_xpath("//a//span[text()='Next']")
next_btn.click()
except Exception as e:
print(str(e.msg()))
break
except TimeoutException:
print("Page Timed Out")
driver.quit()
Upvotes: 0
Reputation: 563
You have to check if the next page button exist or not. If yes then return True, go to that next page and start scraping if no then return False and move to the next container. Check for the class name of that button first.
# to check if a pagination exists on the page:
def go_next_page():
try:
button = driver.find_element_by_xpath('//a[@class="<class name>"]')
return True, button
except NoSuchElementException:
return False, None
Upvotes: 1
Reputation: 356
You can Firstly get the number of pages available and iterate over for each of the pages and parse the data respectively.
Like if you change the URL with respect to page
Upvotes: 0