requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied while trying to find broken links through Selenium and Python

I want to find the broken links on my web page by using Selenium + Python. I tried the above code but it shows me the following error:

requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied. Perhaps you meant http://None?

Code trials:

for link in links:

    r = requests.head(link.get_attribute('href'))
    print(link.get_attribute('href'), r.status_code)

Full code:

def test_lsearch(self):
    driver=self.driver
    driver.get("http://www.google.com")
    driver.set_page_load_timeout(10)
    driver.find_element_by_name("q").send_keys("selenium")

    driver.set_page_load_timeout(10)
    el=driver.find_element_by_name("btnK")
    el.click()
    time.sleep(5)

    links=driver.find_elements_by_css_selector("a")
    for link in links:
        r=requests.head(link.get_attribute('href'))
        print(link.get_attribute('href'),r.status_code)

Upvotes: 4

Answers (2)

undetected Selenium

Reputation: 193188

This error message...

    raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied. Perhaps you meant http://None?

...implies that the Support for unicode domain names and paths failed within the collected href attribute.

This error is defined in models.py as follows:

    # Support for unicode domain names and paths.
    scheme, auth, host, port, path, query, fragment = parse_url(url)
    if not scheme:
        raise MissingSchema("Invalid URL {0!r}: No schema supplied. "
                            "Perhaps you meant http://{0}?".format(url))

Solution

Possibly you are trying to look for the broken links once the search results are available for the keyword selenium on Google Home Page Search Box. To achieve that you can use the following solution:

Code Block:

import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys 

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get('https://google.co.in/')
search = driver.find_element_by_name('q')
search.send_keys("selenium")
search.send_keys(Keys.RETURN)
links = WebDriverWait(driver, 10).until(EC.visibility_of_any_elements_located((By.XPATH, "//div[@class='rc']//h3//ancestor::a[1]")))
print("Number of links : %s" %len(links))
for link in links:
    r = requests.head(link.get_attribute('href'))
    print(link.get_attribute('href'), r.status_code)

Console Output:

Number of links : 9
https://www.seleniumhq.org/ 200
https://www.seleniumhq.org/download/ 200
https://www.seleniumhq.org/docs/01_introducing_selenium.jsp 200
https://www.guru99.com/selenium-tutorial.html 200
https://en.wikipedia.org/wiki/Selenium_(software) 200
https://github.com/SeleniumHQ 200
https://www.edureka.co/blog/what-is-selenium/ 200
https://seleniumhq.github.io/selenium/docs/api/py/ 200
https://seleniumhq.github.io/docs/ 200

Update

As per your counter question, it would be a bit tough to canonically answer why xpath worked but not tagName from Selenium perspective. Perhaps you may like to dig deeper into these discussions for the same:

Upvotes: 2

Satish Michael

Reputation: 2015

Try this, I pretty sure there could be better ways to accomplish this and this may or may not solve your problem, In the shore time I'd, I came up this approach and it seems to be working for me

import itertools
import requests
from selenium.webdriver import Chrome
from selenium.webdriver.common.keys import Keys

driver = Chrome()
driver.get('https://www.google.com/')

# Search 'selenium'
search = driver.find_element_by_css_selector('input[aria-label="Search"]')
search.send_keys('selenium')
search.send_keys(Keys.ENTER)

# Resuls div
container = driver.find_element_by_id('rso')
results = container.find_elements_by_css_selector('.bkWMgd')
del results[1]

# links
_links = []
for result in results:
    _links.append([r.get_attribute('href') for r in result.find_elements_by_css_selector('.r>a')])

driver.quit()
links = list(itertools.chain.from_iterable(_links))

for link in links:
    r = requests.get(link)
    print(link, r.status_code)

output

https://www.seleniumhq.org/ 200
https://www.seleniumhq.org/projects/webdriver/ 200
https://www.webmd.com/a-to-z-guides/supplement-guide-selenium 200
https://www.healthline.com/nutrition/selenium-benefits 200
https://github.com/SeleniumHQ/selenium 200
https://en.wikipedia.org/wiki/Selenium_(software) 200
https://www.medicalnewstoday.com/articles/287842.php 200
https://ods.od.nih.gov/factsheets/Selenium-Consumer/ 200
https://selenium-python.readthedocs.io/ 200
https://selenium-python.readthedocs.io/installation.html 200

Upvotes: 0

requests.exceptions.MissingSchema: Invalid URL &#39;None&#39;: No schema supplied while trying to find broken links through Selenium and Python

Answers (2)

Solution

Update

Related Questions

requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied while trying to find broken links through Selenium and Python