user1345331
user1345331

Reputation: 121

Extract partial text from element link with python/selenium

In the below HTML, my goal is to return zzde7e35d-8d9d-4763-95d2-9198684abb12

<div class = container>    
    <a class="Blue-Button" data-type="patch" data-disable-with="Waiting" href="/market/opening/zzde7e35d-8d9d-4763-95d2-9198684abb12">Yes</a>
</div>

The problem is, I can't even seem to locate the URL within the div

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

link = example.url
driver.get(link)
URL = driver.find_element_by_xpath('//a[contains(@href,"market")]')
print(URL)

Printing the above, I seem to get a bunch of random characters unrelated to the HTML at all, let alone the URL in question.

If it simplifies the issue, the number of characters that are returned will always be the same length, is indexing an easy work around?

Upvotes: 2

Views: 952

Answers (3)

undetected Selenium
undetected Selenium

Reputation: 193058

To print the partial value of the href attribute i.e. zzde7e35d-8d9d-4763-95d2-9198684abb12 you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

  • Using LINK_TEXT:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.LINK_TEXT, "Yes"))).get_attribute("href").split("/")[3])
    
  • Using CSS_SELECTOR:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a.Blue-Button[data-type='patch'][data-disable-with='Waiting'][href*='market']"))).get_attribute("href").split("/")[3])
    
  • Using XPATH:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[@class='Blue-Button' and @data-type='patch'][@data-disable-with='Waiting' and contains(@href, 'market')]"))).get_attribute("href").split("/")[-1])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant detailed discussion in Find div aria label starting with certain text and then extract

Upvotes: 0

Prophet
Prophet

Reputation: 33351

You are possibly missing a delay.
Instead of

URL = driver.find_element_by_xpath('//a[contains(@href,"market")]')

Try using

from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

URL = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '//a[contains(@href,"market")]'))).get_attribute("href")
print(URL)

Also you will have to extract the href attribute value from the returned web element object as shown in the code.
In case this still not worked check if the element you are trying to access inside iframe etc. Or maybe the locator is not unique etc.

Upvotes: 0

KunduK
KunduK

Reputation: 33384

If you want to get the href you need to use get_attribute('href') this will give you /market/opening/zzde7e35d-8d9d-4763-95d2-9198684abb12 and then split() this and you will get the last element.

link = example.url
driver.get(link)
URL = driver.find_element_by_xpath('//a[contains(@href,"market")]')
print(URL.get_attribute('href').split("/")[-1])

Output:

zzde7e35d-8d9d-4763-95d2-9198684abb12

Upvotes: 2

Related Questions