Reputation: 13
New to python and selenium webdriver. I am trying to check all the links on my own webpage and use it's http status code to see if it is a broken link or not. The code that I am running (reduced from original)...
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import requests
links = driver.find_elements_by_xpath("//a[@href]")
while len(links):
url = links.pop()
url = url.get_attribute("href")
print(url)
The html looks like...
<ul>
<li><a href = "https://www.google.com/">visit google</a></li>
<li><a href = "broken">broken link ex</a></li>
</ul>
When I run my script, the only link that gets printed is the google link and not the broken link. I have done some test cases and it seems that only the links that include the phrase "http://www" in the link get printed. Although I can change the href links on my webpage to include this phrase, I have specific reasons as to why they cannot be included.
If I can just get all the links (with or without the "http://www" phrase) using driver.find_elements_by_xpath("//a[@href]")
, then I can convert these later in the script to include the phrase and then get the http status codes.
I saw other posts but none that helped me get over this obstacle. Any clarification/workaround/hint would be appreciated.
Upvotes: 1
Views: 8415
Reputation: 60604
the following list comprehension should get you a list of all links. It locates all anchor tags and generates a list containing the 'href' attribute of each element.
links = [elem.get_attribute("href") for elem in driver.find_elements_by_tag_name('a')]
here is same thing broken down into small steps and used as a function:
def get_all_links(driver):
links = []
elements = driver.find_elements_by_tag_name('a')
for elem in elements:
href = elem.get_attribute("href")
links.append(href)
return links
Upvotes: 8