Reputation: 1
I would like to web scrape car reviews on the below webpage for personal interests
www.cardekho.com/user-reviews/maruti-alto-800
I succeeded in scraping reviews on the first page with the below codes
pip install selenium
pip install webdriver-manager
import selenium
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome('chromedriver.exe')
url = 'https://www.cardekho.com/user-reviews/maruti-alto-800'
driver.get(url)
reviews = driver.find_elements(By.CSS_SELECTOR, ".contentspace")
for i in reviews:
i_title = i.find_element(By.CSS_SELECTOR, "h3 > a")
i_desc = i.find_element(By.CSS_SELECTOR, "p")
print(i_title.text, i_desc.text)
But I do not seem to be able to scrape all the other remaining reviews on the next pages They range from 1 to 16 and they include "next".
I tried the below codes selecting the main part of "page bar" But page_bar[0] got me page#6 and more than [0] would give me "list out of range"
page_bar = driver.find_elements(By.CSS_SELECTOR, '#rf01 > div.app-content > div > div:nth-child(1) > main > div > div.gsc_col-xs-12.gsc_col-sm-12.gsc_col-md-8.gsc_col-lg-9 > div:nth-child(3) > section > div > div.marginTop20 > div > div > div > ul')
for i in page_bar:
print(i.text)
page_bar[0].click()
Upvotes: 0
Views: 228
Reputation: 684
If you click on the next pages, you will notice the link containing the page numbers.
Eg Page 2: https://www.cardekho.com/user-reviews/maruti-alto-800/2?subtab=latest
Eg Page 3: https://www.cardekho.com/user-reviews/maruti-alto-800/3?subtab=latest
Therefore, to complete your task you just need to add a for loop going through pages 1-16 by changing the number in the link and you would have scraped all the pages you needed.
For example,
for i in range(1, 16):
CurrentLinkIs = "https://www.cardekho.com/user-reviews/maruti-alto-800/" + str(i) + "?subtab=latest"
#perform your scraping here
#.
#.
#.
Upvotes: 1