Reputation: 2328
I wanted to write something that would return me the video duration of a youtube link. So I found requests
and lxml
and started out following this guide.
Here's the setup:
import requests
from lxml import html
url = 'https://www.youtube.com/watch?v=EN8fNb6uhns'
page = requests.get(url)
tree = html.fromstring(page.content)
Then I try and use xpath to get the duration, but it doesn't work. Trying to get the duration:
tree.xpath('//span[@class="ytp-time-duration"]/text()')
returns an empty list. But when I try and get the title (as a test) with:
tree.xpath('//h1[@class="watch-title-container"]/span/text()')
it works. When I use inspect to copy the xpath of the duration element nothing is returned:
tree.xpath('/html/body/div[2]/div[4]/div/div[4]/div[2]/div[2]/div/div[24]/div[2]/div[1]/div/span[3]')
When I do the same for the title it works again.
What is going on?
Upvotes: 2
Views: 658
Reputation: 31
For YouTube the Xpath was not consistent. I got two different Xpaths (these are the 2 Xpaths I got for capturing the Video Duration)
//*[@id='movie_player']/div[5]/div/div/div[5]/button/div[1]
//*[@id="movie_player"]/div[26]/div[2]/div[1]/div/span[3]
Tried the option of finding the Element by Class name
FindElement(By.ClassName("ytp-time-duration"))
This worked always.
string VideoDuration = firfxdrivr.FindElement(By.ClassName("ytp-time-duration")).GetAttribute("textContent");
Console.WriteLine(VideoDuration);
Output: 19:18
Upvotes: 0
Reputation: 12158
span[@class="ytp-time-duration"]
this span
tag is generated by JavaScript, and it will not returned by requests
, requests
just return the HTML code
Upvotes: 1