cwyjm
cwyjm

Reputation: 61

Scraping a Span tag without Class name and does not appear in all Elements

I am web scraping a review page using Selenium in Python. I want to extract the rating of each review (ie. Extract 7 from 7/10 in a review). The HTML element constructs like this:

    <div class ="review">
         <div class="rating-bar">
            <span class="user-rating">
               <svg class="ipl-icon ipl-star-icon 
                "xmlns="http://www.w3.org/2000/svg" fill="#000000" height="24" 
                 viewBox="0 0 24 24" width="24"> <path d="M0 0h24v24H0z" 
                 fill="none"></path> <path d="M12 17.27L18.18 21l-1.64-7.03L22 
                 9.24l-7.19-.61L12 2 9.19 8.63 2 9.24l5.46 4.73L5.82 21z"> 
                </path> <path d="M0 0h24v24H0z" fill="none"></path> </svg>
               <span>7</span>             # What I want to extract
               <span class='scale'>/10</span>
             </span>
            </div>

The element does not have any class name, so I assume to extract it using the class user-rating under the span tag:

    rating = driver.find_elements_by_class_name('user-rating')

But how should I extract the span tag within another span tag? I cannot refer it to any class name.

In addition, not every review contains a rating, so when it scrapes to a review without rating, it prompts me the error:

    NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".rating-other-user-rating"} (Session info: chrome=87.0.4280.66)

This is what I have tried out so far:

    review = driver.find_elements_by_class_name("review")
    rating_ls = []
    
    for i in review:
        rating = i.find_element_by_class_name('rating-other-user-rating').text
        # If rating exists, append it to the list, otherwise append "N/A" 
        rating_ls.append(rating[0] if rating else "N/A")   

I appreciate if anyone can help me with this. Thanks a lot in advance!

Upvotes: 2

Views: 1216

Answers (2)

DonnyFlaw
DonnyFlaw

Reputation: 690

Try to wait for elements (probably they added by JS code):

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

reviews = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "review-container")))

for review in reviews:
    _rating = review.find_elements_by_class_name('rating-other-user-rating')
    rating = _rating[0].text if _rating else 'N/A' 
    _comment = review.find_elements_by_class_name('content')
    comment = _comment[0].text if _comment else 'N/A' 
    print(rating + ": " + comment)

Upvotes: 1

undetected Selenium
undetected Selenium

Reputation: 193108

To extract the rating of each review (ie. Extract 7 from 7/10 in a review) using Selenium and you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • Using XPATH, span index and text attribute:

    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='review']//span[@class='user-rating']//following::span[1]")))])
    
  • Using XPATH, attribute and get_attribute():

    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='review']/span[@class='user-rating']//span[not(contains(@class,'scale'))]")))])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

Outro

Link to useful documentation:

Upvotes: 0

Related Questions