QMan5
QMan5

Reputation: 779

Trying to get score from Rotten Tomatoes URL

I'm trying to get original score from this link: https://www.rottentomatoes.com/m/avengers_endgame/reviews

I am getting something returned, but it's not what I'm expecting. I'm getting a list of something like this:

[<selenium.webdriver.remote.webelement.WebElement (session="e31633b4d123fff6432c6726cc26ad65", element="a9b96431-d527-4b79-8ddd-01d18f362cbd")>

(I just gave an example of one, but there is a list of these)

Here's what I'm using to get this result:

rows = driver.find_elements_by_xpath("//div[@class='row review_table_row']")
scores =[]
for row in rows:
    scores.append(row.find_elements_by_xpath('//div[contains(concat(" ",normalize-space(@class)," ")," review_desc ")]//div[contains(concat(" ",normalize-space(@class)," ")," small ")]'))

Does anyone know what's wrong with xpath?

Just illustrate what I'm trying to get from this website

Upvotes: 1

Views: 289

Answers (2)

wp78de
wp78de

Reputation: 18960

We could use a bit more specific XPath that ensures it contains a score in the first place

driver.get("https://www.rottentomatoes.com/m/avengers_endgame/reviews")
rows = driver.find_elements_by_xpath("//div[@class='row review_table_row' and contains(., 'Original Score:')]")

then the extraction becomes easy

scores =[]
for row in rows:
    scores.append(row.text.split('Original Score: ')[1])

Out[20]: ['2.5/4.0', '4/5', '9/10', '3/5', '2.5/5', '5 / 5', '3.5/4', '4/4', '4/5', '4/5', 'B+', '3.5/4', 'B+', '7/10', '9/10']

Upvotes: 1

tudopropaganda
tudopropaganda

Reputation: 138

You can try this:

response.xpath('//div[@class="small subtle review-link"]').get().split('Original Score: ')[1].split('\n')[0]

And the ending result would be:

'2.5/4.0'

Upvotes: 1

Related Questions