Reputation: 1054
I am trying to extract the title, duration and the link of all the videos that a YT channel has. I used selenium and python in the following way:
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome()
results = []
url = "https://www.youtube.com/channel/<channel name>/videos"
driver.get(url)
ht=driver.execute_script("return document.documentElement.scrollHeight;")
while True:
prev_ht=driver.execute_script("return document.documentElement.scrollHeight;")
driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")
time.sleep(2)
ht=driver.execute_script("return document.documentElement.scrollHeight;")
if prev_ht==ht:
break
links=driver.find_elements_by_xpath('//*[@class="style-scope ytd-grid-renderer"]')
for link in links:
print()
print(link.get_attribute("href"), link.get_attribute("text"))
When I try to get the duration of the video using class="style-scope ytd-thumbnail-overlay-time-status-renderer"
class, the driver returns that the element doesn't exist. I managed the got the other two features though.
Upvotes: 1
Views: 794
Reputation: 33351
Your XPath locator is not correct, so please use the following:
links=driver.find_elements_by_xpath('//*[name() = "ytd-grid-video-renderer" and @class="style-scope ytd-grid-renderer"]')
Now, to get the videos length per each link you defined you can do the following:
links=driver.find_elements_by_xpath('//*[name() = "ytd-grid-video-renderer" and @class="style-scope ytd-grid-renderer"]')
for link in links:
duration = link.find_element_by_xpath('.//span[contains(@class,"time-status")]').text
print(duration)
Upvotes: 1
Reputation: 60
Good Morning!
Selenium can have trouble getting the video duration if the cursor is not in the perfect spot. Here's a GIF to show that: Gif. You can get around this by using some of Youtube's built-in Javascript functions. Here's an example that uses this:
video_dur = self.driver.execute_script(
"return document.getElementById('movie_player').getCurrentTime()")
video_len = self.driver.execute_script(
"return document.getElementById('movie_player').getDuration()")
video_len = int(video_len) / 60
Have a great day!
Upvotes: 0