Sashaank
Sashaank

Reputation: 964

How to extract text from an unordered list with selenium

I am learning selenium. I am trying to extract the Manufacturer info from the amazon website.

In the above website the Manufacturer info exists in an unordered list. How do I extract this information with selenium.

I tried this code but it does not seem to work

try:
    manufacturer_element = WebDriverWait(driver, 5).until(
            EC.presence_of_element_located((By.XPATH, "//ul//span[text()='Manufacturer']/ancestor::li")))

    manufacturer_text = manufacturer_element.text.split(':')[1].strip()
    return manufacturer_text

except TimeoutException:
    return None

This is how the list is designed

<ul class="a-unordered-list a-nostyle a-vertical a-spacing-none detail-bullet-list">
    <li><span class="a-list-item">
            <span class="detail-bullet-label a-text-bold">Is Discontinued By Manufacturer
            :
            </span>
            <span>No</span>
        </span></li>
    
    <li><span class="a-list-item">
            <span class="detail-bullet-label a-text-bold">Package Dimensions
            :
            </span>
            <span>10 x 4 x 4 inches</span>
        </span></li>
    
    <li><span class="a-list-item">
            <span class="detail-bullet-label a-text-bold">Item model number
            :
            </span>
            <span>BHBUSWA2918</span>
        </span></li>

    <li><span class="a-list-item">
        <span class="detail-bullet-label a-text-bold">UPC
        :
        </span>
        <span>874989001644</span>
    </span></li>

    <li><span class="a-list-item">
        <span class="detail-bullet-label a-text-bold">Manufacturer
        :
        </span>
        <span>Wonder Bread</span>
    </span></li>

    <li><span class="a-list-item">
        <span class="detail-bullet-label a-text-bold">ASIN
        :
        </span>
        <span>B0038EUT9W</span>
    </span></li>
</ul>

From the above list I want to extract Wonder Bread

Thanks in advance

Upvotes: 1

Views: 222

Answers (2)

Peter Quan
Peter Quan

Reputation: 808

For your code, the problem is from the xpath expression. The real innerText of span is "Manufacturer : " instead of "Manufacturer", so text()='Manufacturer' failed.

In addition, it can be seen from the page source code, there are white-space (newline) in value of span, you should do it carefully.

You can fix the xpath like this

"//ul//span[starts-with(text(), 'Manufacturer')]/ancestor::li"

or

"//ul//span[normalize-space() = 'Manufacturer :']/ancestor::li"

Upvotes: 0

frianH
frianH

Reputation: 7563

Try to find the element with By.CSS_SELECTOR:

try:
    manufacturer_element = WebDriverWait(driver, 5).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, "div#detailBullets_feature_div > ul > li:nth-child(5)")))

    manufacturer_text = manufacturer_element.text.split(':')[1].strip()
    return manufacturer_text

except TimeoutException:
    return None

li:nth-child(5) the above code refers to Manufacturer.

Or with this xpath:

try:
    manufacturer_text = WebDriverWait(driver, 5).until(
            EC.presence_of_element_located((By.XPATH, "//span[normalize-space() = 'Manufacturer :']//following-sibling::span"))).text
    return manufacturer_text

except TimeoutException:
    return None

Upvotes: 1

Related Questions