Reputation: 61
I am web scraping a review page using Selenium in Python. I want to extract the rating of each review (ie. Extract 7 from 7/10 in a review). The HTML element constructs like this:
<div class ="review">
<div class="rating-bar">
<span class="user-rating">
<svg class="ipl-icon ipl-star-icon
"xmlns="http://www.w3.org/2000/svg" fill="#000000" height="24"
viewBox="0 0 24 24" width="24"> <path d="M0 0h24v24H0z"
fill="none"></path> <path d="M12 17.27L18.18 21l-1.64-7.03L22
9.24l-7.19-.61L12 2 9.19 8.63 2 9.24l5.46 4.73L5.82 21z">
</path> <path d="M0 0h24v24H0z" fill="none"></path> </svg>
<span>7</span> # What I want to extract
<span class='scale'>/10</span>
</span>
</div>
The element does not have any class name, so I assume to extract it using the class user-rating
under the span
tag:
rating = driver.find_elements_by_class_name('user-rating')
But how should I extract the span tag within another span tag? I cannot refer it to any class name.
In addition, not every review contains a rating, so when it scrapes to a review without rating, it prompts me the error:
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".rating-other-user-rating"} (Session info: chrome=87.0.4280.66)
This is what I have tried out so far:
review = driver.find_elements_by_class_name("review")
rating_ls = []
for i in review:
rating = i.find_element_by_class_name('rating-other-user-rating').text
# If rating exists, append it to the list, otherwise append "N/A"
rating_ls.append(rating[0] if rating else "N/A")
I appreciate if anyone can help me with this. Thanks a lot in advance!
Upvotes: 2
Views: 1216
Reputation: 690
Try to wait for elements (probably they added by JS code):
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
reviews = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "review-container")))
for review in reviews:
_rating = review.find_elements_by_class_name('rating-other-user-rating')
rating = _rating[0].text if _rating else 'N/A'
_comment = review.find_elements_by_class_name('content')
comment = _comment[0].text if _comment else 'N/A'
print(rating + ": " + comment)
Upvotes: 1
Reputation: 193108
To extract the rating of each review (ie. Extract 7 from 7/10 in a review) using Selenium and python you have to induce WebDriverWait for visibility_of_all_elements_located()
and you can use either of the following Locator Strategies:
Using XPATH
, span index and text attribute:
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='review']//span[@class='user-rating']//following::span[1]")))])
Using XPATH
, attribute and get_attribute()
:
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='review']/span[@class='user-rating']//span[not(contains(@class,'scale'))]")))])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Link to useful documentation:
get_attribute()
method Gets the given attribute or property of the element.
text
attribute returns The text of the element.
Upvotes: 0