Dheeraj Gupta
Dheeraj Gupta

Reputation: 469

Load multiple pages by web scraping python

I wrote a python code for web scraping so that I can import the data from flipkart.
I need to load multiple pages so that I can import many products but right now only 1 product page is coming.

from urllib.request import urlopen as uReq
from requests import get
from bs4 import BeautifulSoup as soup
import tablib 


my_url = 'https://www.xxxxxx.com/food-processors/pr?sid=j9e%2Cm38%2Crj3&page=1'

uClient2 = uReq(my_url)
page_html = uClient2.read()
uClient2.close()

page_soup = soup(page_html, "html.parser")

containers11 = page_soup.findAll("div",{"class":"_3O0U0u"}) 

filename = "FoodProcessor.csv"
f = open(filename, "w", encoding='utf-8-sig')
headers = "Product, Price, Description \n"
f.write(headers)

for container in containers11:
    title_container = container.findAll("div",{"class":"_3wU53n"})
    product_name = title_container[0].text

    price_con = container.findAll("div",{"class":"_1vC4OE _2rQ-NK"})
    price = price_con[0].text



    description_container = container.findAll("ul",{"class":"vFw0gD"})
    product_description = description_container[0].text


    print("Product: " + product_name)
    print("Price: " + price)
    print("Description" + product_description)
    f.write(product_name + "," + price.replace(",","") +"," + product_description +"\n")

f.close()

Upvotes: 0

Views: 251

Answers (4)

Farid Mammadaliyev
Farid Mammadaliyev

Reputation: 118

For me, the easiest way is to add an extra loop with the "page" variable:

# just check the number of the last page on the website
page = 1

while page != 10:
     print(f'Scraping page: {page}')
     my_url = 'https://www.xxxxxx.com/food-processors/pr?sid=j9e%2Cm38%2Crj3&page={page}'

     # here add the for loop you already have


     page += 1

This method should work.

Upvotes: 0

Dheeraj Gupta
Dheeraj Gupta

Reputation: 469

try:
        next_btn = driver.find_element_by_xpath("//a//span[text()='Next']")
        next_btn.click()
    except ElementClickInterceptedException as ec: 
        classes = "_3ighFh"
        overlay = driver.find_element_by_xpath("(//div[@class='{}'])[last()]".format(classes))
        driver.execute_script("arguments[0].style.visibility = 'hidden'",overlay)
        next_btn = driver.find_element_by_xpath("//a//span[text()='Next']")
        next_btn.click()
    
    except Exception as e:
        print(str(e.msg()))
        break
except TimeoutException:
    print("Page Timed Out")

driver.quit()

Upvotes: 0

rmb
rmb

Reputation: 563

You have to check if the next page button exist or not. If yes then return True, go to that next page and start scraping if no then return False and move to the next container. Check for the class name of that button first.

# to check if a pagination exists on the page:
    
    def go_next_page():
        try:
            button = driver.find_element_by_xpath('//a[@class="<class name>"]')
            return True, button
        except NoSuchElementException:
            return False, None

Upvotes: 1

Mandar Autade
Mandar Autade

Reputation: 356

You can Firstly get the number of pages available and iterate over for each of the pages and parse the data respectively.

Like if you change the URL with respect to page

  • 'https://www.flipkart.com/food-processors/pr?sid=j9e%2Cm38%2Crj3&page=1' which points to page 1
  • 'https://www.flipkart.com/food-processors/pr?sid=j9e%2Cm38%2Crj3&page=2' which points to page 2

Upvotes: 0

Related Questions