QMan5
QMan5

Reputation: 779

xpath for scraping critic's reviews from rotten tomatoes

I'm trying to get the reviewer names from this url here: https://www.rottentomatoes.com/m/avengers_endgame/reviews

In my output currently, I have everything except for the reviewer's name and site that they represent. I was curious if anyone might know what is wrong with the xpath that I'm using to scrape the reviewers and sites. I don't think rotten tomatoes is blocking me because all other information is present in my dataframe. It's just reviewers and site that they come from which is not present.

Here is the code that I'm using to get reviewers and sites:

reviewers = driver.find_elements_by_xpath('//*[@id="reviews"]/div[2]/div[4]/div[1]')
 for r in reviewers:
        names.append(r.find_element_by_xpath('//*[@id="reviews"]/div[2]/div[4]/div[' +str(reviewnum)+ ']/div[1]/div[3]/a[1]').text)
        sites.append(r.find_element_by_xpath('//*[@id="reviews"]/div[2]/div[4]/div[' +str(reviewnum)+']/div[1]/div[3]/a[2]/em').text)
        reviewnum+=1

I'm not entirely sure why this isn't working. Could someone let me know what I'm doing wrong?

reviewnum is just an iterator

Please let me know if you want to see more of the code if that might be helpful.

Upvotes: 0

Views: 281

Answers (1)

Sureshmani Kalirajan
Sureshmani Kalirajan

Reputation: 1938

You just need to iterate it through the rows in the table and print the child elements within each rows.

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait

driver=webdriver.Chrome(executable_path='Your path to the driver')
driver.get("https://www.rottentomatoes.com/m/avengers_endgame/reviews")
# I wait until page is loaded
WebDriverWait(driver, 15).until(EC.element_to_be_clickable((By.CLASS_NAME, 'review_table')))



rows = driver.find_elements_by_xpath("//div[@class='row review_table_row']")
print(len(rows))

reviewername = []
reviewersite = []
for row in rows:
    reviewername.append(row.find_element_by_xpath("(.//div[contains(@class,'critic_name')]/a)[1]").text)
    reviewersite.append(row.find_element_by_xpath("(.//div[contains(@class,'critic_name')]/a)[2]").text)

print(reviewername)
print(reviewersite)

driver.quit()

Output:

20

['Richard Propes', 'Kelechi Ehenulo', 'Victor Pineyro', 'Stephen A. Russell', 'Matt Cipolla', 'James Hanton', 'Jason Fraley', 'Steven Prokopy', 'Saibal Chatterjee', 'Zehra Phelan', 'Allen Almachar', 'Nabila Hatimy', 'Ricardo Gallegos', 'Doug Walker', 'Brent McKnight', 'Nikki Francisco', 'Damond Fudge', 'Yasser Medina', 'Alex Hudson', 'Dan Tabor']
['TheIndependentCritic.com', 'Confessions From A Geek Mind', 'Seventh Art Studio', 'The New Daily (Australia)', 'Film Monthly', 'Outtake Mag', 'WTOP (Washington, D.C.)', 'Third Coast Review', 'NDTV', 'Flavourmag', 'The MacGuffin', 'The Star (Kenya)', 'Pólvora', 'Channel Awesome', 'The Last Thing I See', 'Cosmopolitan (Philippines)', 'KCCI (Des Moines, IA)', 'Cinemaficionados', 'Exclaim!', 'Phawker']

Upvotes: 1

Related Questions