Reputation: 61
I'm doing a project where I need to store the date that a video in youtube was published.
The problem is that I'm having some difficulties trying to find this data in the middle of the HTML source code
Here's my code attempt:
import requests
from bs4 import BeautifulSoup as BS
url = "https://www.youtube.com/watch?v=XQgXKtPSzUI&t=915s"
response = requests.get(url)
soup = BS(response.content, "html.parser")
response.close()
dia = soup.find_all('span',{'class':'date'})
print(dia)
Output:
[]
I know that the arguments I'm sending to .find_all()
are wrong.
I'm saying this because I was able to store other information from the video using the same code, such as the title and the views.
I've tried different arguments when using .find_all()
but didn't figured out how to find it.
Upvotes: 1
Views: 3625
Reputation: 1
Try adding attribute as shown below:
dia = soup.find_all('span', attr={'class':'date'})
Upvotes: 0
Reputation: 2511
If you use Python with pafy, the object you'll get has the published date easily accessible.
Install pafy: "pip install pafy"
import pafy
vid = pafy.new("www.youtube.com/watch?v=2342342whatever")
published_date = vid.published
print(published_date) #Python3 print statement
Check out the pafy docs for more info: https://pythonhosted.org/Pafy/ The reason I leave the doc link is because it's a really neat module, it handles getting the data without external request modules and also exposes a bunch of other useful properties of the video, like the best format download link, etc.
Upvotes: 3
Reputation: 383
It seems that YouTube is using javascript to add the date, so that information is not in the source code. You should try using Selenium to scrape, or get the date from the js since it is directly in the source code.
Upvotes: 0