tigerninjaman
tigerninjaman

Reputation: 393

Selenium - find element by link text

I am using selenium webdriver on Chrome; python 3 on Windows 10. I want to scrape some reports from a database. I search with a company ID and a year, the results are a list of links formatted in a specific way: something like year_companyID_seeminglyRandomDateAndDoctype.extension, e.g. 2018_2330_20020713F04.pdf. I want to get all pdfs of a certain doctype. I can grab all links for a certain doctype using webdriver.find_elements_by_partial_link_text('F04') or all of that extension with '.pdf' instead of 'F04', but I cannot figure out a way to check for both at once. First I tried something like

links = webdriver.find_elements_by_partial_link_text('F04')
for l in links:
    if l.find('.pdf') == -1:
        continue
    else:
        #do some stuff

But unfortunately, the links are WebElements:

print(links[0])
>> <selenium.webdriver.remote.webelement.WebElement (session="78494f3527260607202e68f6d93668fe", element="0.8703868381417961-1")>
print(links[0].get_attribute('href'))
>> javascript:readfile2("F","2330","2015_2330_20160607F04.pdf")

so the conditional in the for loop above fails.

I see that I could probably access the necessary information in whatever that object is, but I would prefer to do the checks first when getting the links. Is there any way to check multiple conditions in the webdriver.find_elements_by_* methods?

Upvotes: 1

Views: 1123

Answers (2)

gmb468
gmb468

Reputation: 3

Andersson's approach seems to work with a slight correction: if link.get_attribute('href').endswith('.pdf')] rather than if link.get_attribute('href').endswith('.pdf")')], i.e. delete the ").

Upvotes: 0

Andersson
Andersson

Reputation: 52685

You can try to use below code

links = [link.get_attribute('href') for link in webdriver.find_elements_by_partial_link_text('F04') if link.get_attribute('href').endswith('.pdf")')]

You can also try XPath as below

links = webdriver.find_elements_by_xpath('//a[contains(., "F04") and contains(@href, ".pdf")]')

Upvotes: 1

Related Questions