Reputation: 393
I am using selenium webdriver
on Chrome; python 3
on Windows 10.
I want to scrape some reports from a database. I search with a company ID and a year, the results are a list of links formatted in a specific way: something like year_companyID_seeminglyRandomDateAndDoctype.extension
, e.g. 2018_2330_20020713F04.pdf
. I want to get all pdfs of a certain doctype. I can grab all links for a certain doctype using webdriver.find_elements_by_partial_link_text('F04')
or all of that extension with '.pdf'
instead of 'F04'
, but I cannot figure out a way to check for both at once. First I tried something like
links = webdriver.find_elements_by_partial_link_text('F04')
for l in links:
if l.find('.pdf') == -1:
continue
else:
#do some stuff
But unfortunately, the links are WebElements:
print(links[0])
>> <selenium.webdriver.remote.webelement.WebElement (session="78494f3527260607202e68f6d93668fe", element="0.8703868381417961-1")>
print(links[0].get_attribute('href'))
>> javascript:readfile2("F","2330","2015_2330_20160607F04.pdf")
so the conditional in the for
loop above fails.
I see that I could probably access the necessary information in whatever that object is, but I would prefer to do the checks first when getting the links. Is there any way to check multiple conditions in the webdriver.find_elements_by_*
methods?
Upvotes: 1
Views: 1123
Reputation: 3
Andersson's approach seems to work with a slight correction: if link.get_attribute('href').endswith('.pdf')] rather than if link.get_attribute('href').endswith('.pdf")')], i.e. delete the ").
Upvotes: 0
Reputation: 52685
You can try to use below code
links = [link.get_attribute('href') for link in webdriver.find_elements_by_partial_link_text('F04') if link.get_attribute('href').endswith('.pdf")')]
You can also try XPath as below
links = webdriver.find_elements_by_xpath('//a[contains(., "F04") and contains(@href, ".pdf")]')
Upvotes: 1