Taimoor Ali
Taimoor Ali

Reputation: 178

Parsing HTML of Youtube playlists

I am having trouble parsing the HTML of youtube Playlists. For example, when I inspect the tags of "https://www.youtube.com/playlist?list=PLS1QulWo1RIYmaxcEqw5JhK3b-6rgdWO_". I see the class name "yt-simple-endpoint.style-scope.ytd-playlist-video-renderer" . But this does not work when I select the elements using bs4. However, I found another piece of working code online which selects the following class "pl-video-title-link" . But I am not able to find this class on the web page and none of the tags have this class? Attached is the working code. Any help would be appreciated.

from bs4 import BeautifulSoup as bs
import requests
r = requests.get('https://www.youtube.com/playlist? 
list=PLS1QulWo1RIYmaxcEqw5JhK3b-6rgdWO_')
page = r.text
soup = bs(page,'html.parser')
res = soup.find_all('a',{'class':'pl-video-title-link'})
for l in res:
print (l.get("href"))

Upvotes: 2

Views: 5014

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195573

This page changes its structure with JavaScript, but you can print the soup upon downloading and see, where the video links are initially. In this case, in tag <tr> with class pl-video:

from bs4 import BeautifulSoup
import requests

url = 'https://www.youtube.com/playlist?list=PLS1QulWo1RIYmaxcEqw5JhK3b-6rgdWO_'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')

for i, tr in enumerate(soup.select('tr.pl-video')):
    print('{}. {}'.format(i + 1, tr['data-title']))
    print('https://www.youtube.com' + tr.a['href'])
    print('-' * 80)

Prints:

1. Shell Scripting Tutorial for Beginners 1 -  Introduction
https://www.youtube.com/watch?v=cQepf9fY6cE&list=PLS1QulWo1RIYmaxcEqw5JhK3b-6rgdWO_&index=2&t=0s
--------------------------------------------------------------------------------
2. Shell Scripting Tutorial for Beginners 2 - using Variables and Comments
https://www.youtube.com/watch?v=vQv4W-JfrmQ&list=PLS1QulWo1RIYmaxcEqw5JhK3b-6rgdWO_&index=3&t=0s
--------------------------------------------------------------------------------
3. Shell Scripting Tutorial for Beginners 3 - Read User Input
https://www.youtube.com/watch?v=AcSkkNAsGCY&list=PLS1QulWo1RIYmaxcEqw5JhK3b-6rgdWO_&index=4&t=0s
--------------------------------------------------------------------------------

... all the way to:

32. How Install VirtualBox Guest Additions on Ubuntu 18.04 Guest / virtual machine
https://www.youtube.com/watch?v=qNecdUsuTPw&list=PLS1QulWo1RIYmaxcEqw5JhK3b-6rgdWO_&index=33&t=0s
--------------------------------------------------------------------------------
33. How to install Java JDK 10 on Ubuntu 18.04 LTS (Debian Linux)
https://www.youtube.com/watch?v=4RJ60fqeTN4&list=PLS1QulWo1RIYmaxcEqw5JhK3b-6rgdWO_&index=34&t=0s
--------------------------------------------------------------------------------

Upvotes: 1

Related Questions