Reputation: 143
I have some simple selenium scraping code that returns all the search results, but when I run the for loop, it displays an error: Message: invalid argument: 'url' must be a string
(Session info: chrome=93.0.4577.82)
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_path = r'C:\Windows\chromedriver.exe'
driver = webdriver.Chrome(chrome_path)
driver.get("https://www.youtube.com/results?search_query=python+course")
user_data = driver.find_elements_by_xpath('//*[@id="video-title"]')
links = []
for i in user_data:
links.append(i.get_attribute('href'))
print(links)
wait = WebDriverWait(driver, 10)
for x in links:
driver.get(x)
v_id = x.strip('https://www.youtube.com/watch?v=')
#//*[@id="video-title"]/yt-formatted-string
v_title = wait.until(EC.presence_of_element_located(
(By.CSS_SELECTOR,"h1.title yt-formatted-string"))).text
I would like to ask for some help. How to avoid this error? Thanks.
Upvotes: 2
Views: 1918
Reputation: 33361
You are trying to get the "user_data"
user_data = driver.find_elements_by_xpath('//*[@id="video-title"]')
immediately after opening the YouTube url
driver.get("https://www.youtube.com/results?search_query=python+course")
This causes "user_data" to be an empty list.
This is why when you trying to iterate over "links" with
for x in links:
to iterate over single "x" value of "NoneType" object, not a string.
To fix this you should add a wait/ delay between
driver.get("https://www.youtube.com/results?search_query=python+course")
and
user_data = driver.find_elements_by_xpath('//*[@id="video-title"]')
The simplest way to do that is to add a delay there, like this:
driver.get("https://www.youtube.com/results?search_query=python+course")
time.sleep(8)
user_data = driver.find_elements_by_xpath('//*[@id="video-title"]')
However the recommended approach is to use explicit wait implemented by expected conditions, like this:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_path = r'C:\Windows\chromedriver.exe'
driver = webdriver.Chrome(chrome_path)
wait = WebDriverWait(driver, 20)
driver.get("https://www.youtube.com/results?search_query=python+course")
wait.until(EC.visibility_of_element_located((By.XPATH, "//*[@id="video-title"]")))
#adding some more pause to make all the videos loaded
time.sleep(0.5)
user_data = driver.find_elements_by_xpath('//*[@id="video-title"]')
links = []
for i in user_data:
links.append(i.get_attribute('href'))
print(links)
for x in links:
driver.get(x)
v_id = x.strip('https://www.youtube.com/watch?v=')
#//*[@id="video-title"]/yt-formatted-string
v_title = wait.until(EC.visibility_of_element_located(
(By.CSS_SELECTOR,"h1.title yt-formatted-string"))).text
Also, you should use visibility_of_element_located
instead of presence_of_element_located
since presence_of_element_located
waits only for element initial presence, element state while it's content like texts etc. may still not be ready.
Upvotes: 1