Reputation: 1741
I want to extract text "3351500920037" from the following code:
<div class="specs">
<h3 class="h4">Productinformatie</h3>
<dl class="specs__list">
<dt class="specs__title">
Gewicht
</dt>
<dd class="specs__value">
0,3 kg
</dd>
<dt class="specs__title">
EAN
</dt>
<dd class="specs__value">
3351500920037
</dd>
</dl>
</div>
I use
ref_code = driver.find_element_by_xpath('//*[contains(text(),"EAN")]/following-sibling::dd').text
When I print ref_code seems taking the first line of the text only. It appears empty.
What I have:
print(ref_code)
I would like to have:
print(ref_code)
3351500920037
How can I take the whole text including next lines?
Upvotes: 1
Views: 9165
Reputation: 33384
The item is Not visible on the page that is why visibility_of_element_located()
is getting timeout exception.
To extract text 3351500920037
you need to induce WebDriverWait
and presence_of_element_located()
and get_attribute('textContent')
it will gives the result you are looking for.
print(WebDriverWait(driver,20).until(EC.presence_of_element_located((By.XPATH, "//*[contains(.,'EAN')]/following-sibling::dd[1]"))).get_attribute('textContent'))
This is the full code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://www.bol.com/")
query='Azzaro Chrome 100 ml'
searchelement=WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.ID,"searchfor")))
searchelement.send_keys(query)
searchelement.submit()
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,".product-title.px_list_page_product_click"))).click()
print(WebDriverWait(driver,20).until(EC.presence_of_element_located((By.XPATH, "//*[contains(.,'EAN')]/following-sibling::dd[1]"))).get_attribute('textContent'))
driver.quit()
Upvotes: 1
Reputation: 12255
Here is code how you can get all EAN numbers from first search page. You can improve code by go through all pages first to collect all links:
import selenium, csv, sys, time
from oauth2client.service_account import ServiceAccountCredentials
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
driver = webdriver.Chrome('/usr/local/bin/chromedriver')
wait = WebDriverWait(driver, 20)
query = "Azzaro Chrome 100 ml"
driver.get("https://www.bol.com")
driver.find_element_by_id("searchfor").send_keys(query, u'\ue007')
# wait presence and get all product A elements
products = wait.until(ec.presence_of_all_elements_located((By.CSS_SELECTOR, "li.product-item--row a.product-title")))
# get HREF attribute from products
product_links = [product.get_attribute("href") for product in products]
# iterate through and open all product links, and get ref_code
for link in product_links:
driver.get(link)
ref_code = driver.find_element_by_css_selector("a[data-ean]").get_attribute("data-ean")
print(ref_code)
Upvotes: 2