Reputation: 964
I am learning selenium. I am trying to extract the Manufacturer info from the amazon website.
In the above website the Manufacturer
info exists in an unordered list. How do I extract this information with selenium.
I tried this code but it does not seem to work
try:
manufacturer_element = WebDriverWait(driver, 5).until(
EC.presence_of_element_located((By.XPATH, "//ul//span[text()='Manufacturer']/ancestor::li")))
manufacturer_text = manufacturer_element.text.split(':')[1].strip()
return manufacturer_text
except TimeoutException:
return None
This is how the list is designed
<ul class="a-unordered-list a-nostyle a-vertical a-spacing-none detail-bullet-list">
<li><span class="a-list-item">
<span class="detail-bullet-label a-text-bold">Is Discontinued By Manufacturer
:
</span>
<span>No</span>
</span></li>
<li><span class="a-list-item">
<span class="detail-bullet-label a-text-bold">Package Dimensions
:
</span>
<span>10 x 4 x 4 inches</span>
</span></li>
<li><span class="a-list-item">
<span class="detail-bullet-label a-text-bold">Item model number
:
</span>
<span>BHBUSWA2918</span>
</span></li>
<li><span class="a-list-item">
<span class="detail-bullet-label a-text-bold">UPC
:
</span>
<span>874989001644</span>
</span></li>
<li><span class="a-list-item">
<span class="detail-bullet-label a-text-bold">Manufacturer
:
</span>
<span>Wonder Bread</span>
</span></li>
<li><span class="a-list-item">
<span class="detail-bullet-label a-text-bold">ASIN
:
</span>
<span>B0038EUT9W</span>
</span></li>
</ul>
From the above list I want to extract Wonder Bread
Thanks in advance
Upvotes: 1
Views: 222
Reputation: 808
For your code, the problem is from the xpath expression. The real innerText of span is "Manufacturer : "
instead of "Manufacturer"
, so text()='Manufacturer'
failed.
In addition, it can be seen from the page source code, there are white-space (newline) in value of span, you should do it carefully.
You can fix the xpath like this
"//ul//span[starts-with(text(), 'Manufacturer')]/ancestor::li"
or
"//ul//span[normalize-space() = 'Manufacturer :']/ancestor::li"
Upvotes: 0
Reputation: 7563
Try to find the element with By.CSS_SELECTOR
:
try:
manufacturer_element = WebDriverWait(driver, 5).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "div#detailBullets_feature_div > ul > li:nth-child(5)")))
manufacturer_text = manufacturer_element.text.split(':')[1].strip()
return manufacturer_text
except TimeoutException:
return None
li:nth-child(5)
the above code refers to Manufacturer
.
Or with this xpath:
try:
manufacturer_text = WebDriverWait(driver, 5).until(
EC.presence_of_element_located((By.XPATH, "//span[normalize-space() = 'Manufacturer :']//following-sibling::span"))).text
return manufacturer_text
except TimeoutException:
return None
Upvotes: 1