Reputation: 3781
I am working with python 2.7. I want to create a txt with the list of videos in a particular youtube list:
I wrote (I'm totally new in Python):
from bs4 import BeautifulSoup
import urllib2
import re
url='https://www.youtube.com/playlist?list=PLYjSYQBFeM-zQeZFpWeZ_4tnhc3GQWNj8'
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
href_tags = soup.find_all(href=True)
ff = open("C:/exp/file.txt", "w")
and then this worked:
for i in href_tags:
ff.write(str(i))
ff.close()
But, since I want to keep only those that have "watch" inside, I tried instead:
for i in href_tags:
if re.findall('watch',str(i))=='watch':
ff.write(str(i))
ff.close()
But I got an empty txt.
How can I keep only the links? Is there a better way to do this?
Upvotes: 0
Views: 2566
Reputation: 85
Also, you can install youtube-dl with package subprocess
youtube-dl https://www.youtube.com/playlist?list=PLrhzvIcii6GNjpARdnO4ueTUAVR9eMBpc --yes-playlist --get-title
youtube-dl https://www.youtube.com/playlist?list=PLrhzvIcii6GNjpARdnO4ueTUAVR9eMBpc --yes-playlist --get-url
Upvotes: 0
Reputation: 457
# This code will work if you're are willing to use a newer version of Python
from bs4 import BeautifulSoup
import requests
class Playlist():
def __init__(self, playListUrl):
self._playListUrl = playListUrl
# This will take the html text from Youtube playList url and stores it in a variable called html-doc.
self._htmldoc = requests.get(str(self._playListUrl)).text
self._soup = BeautifulSoup(self._htmldoc, 'html.parser')
# This will create a list of all the titles and the youtube url videos using the html-doc.
self._rawList = self._soup('a', {'class': 'pl-video-title-link'})
# This will loop through a list of titles and Youtube urls and formats it nicely for you.
for link in self._rawList:
print('{0}'.format(link.string) + 'http://youtube.com' + '{0}'.format(link.get('href')))
# To use this class all you got to do is:
# 1 - Create a new object to use the class..
# 2- put a youtube playlist url where it is shown below..
# 3- Run it, and enjoy.
objPlaylist = Playlist('put Youtube playlist url here')
Upvotes: 3
Reputation: 4862
A simple in
should do:
for i in href_tags:
if 'watch' in str(i):
ff.write(str(i))
ff.close()
Upvotes: 0