Johnny
Johnny

Reputation: 512

python: get all youtube video urls of a channel

I want to get all video url's of a specific channel. I think json with python or java would be a good choice. I can get the newest video with the following code, but how can I get ALL video links (>500)?

import urllib, json
author = 'Youtube_Username'
inp = urllib.urlopen(r'http://gdata.youtube.com/feeds/api/videos?max-results=1&alt=json&orderby=published&author=' + author)
resp = json.load(inp)
inp.close()
first = resp['feed']['entry'][0]
print first['title'] # video title
print first['link'][0]['href'] #url

Upvotes: 19

Views: 56427

Answers (9)

dermasmid
dermasmid

Reputation: 514

Short answer:

There is a Python library that can help with that:

pip install scrapetube

import scrapetube

videos = scrapetube.get_channel("UC9-y-6csu5WGm29I7JiwpnA") #channel ID

for video in videos:
    print(video['videoId'])

Long answer:

I created the module mentioned above due to a lack of any other solutions. Here's what I tried:

  • Selenium. It worked but had three big drawbacks:

    1. It requires a web browser and driver to be installed.
    2. I has high CPU and memory requirements.
    3. It can't handle big channels.
  • Using youtube-dl. Like this:

import youtube_dl

youtube_dl_options = {
    'skip_download': True,
    'ignoreerrors': True
}
with youtube_dl.YoutubeDL(youtube_dl_options) as ydl:
    videos = ydl.extract_info(f'https://www.youtube.com/channel/{channel_id}/videos')

This also works for small channels, but for bigger ones I would get blocked by youtube for making too many requests in such a short time (because youtube-dl downloads more info for every video in the channel).

So I decided to make the library scrapetube, which uses the web API to get all the videos. You can see the documentation here.

Upvotes: 27

SReza S
SReza S

Reputation: 192

As mentioned in this github issue, you can use yt-dlp to get the list of channel videos printed out in new lines.

  1. Install yt-dlp by pip install yt-dlp
  2. Run this command: yt-dlp --flat-playlist --print id <channel url>.

I couldn't figure out the API to utilize yt-dlp library directly in code, but you can use subprocess to run the command from code and get the output in list:

import subprocess

def get_video_ids(channel_url):
    command = ["yt-dlp", "--flat-playlist", "--print", "id", channel_url]

    result = subprocess.run(
        command, capture_output=True, text=True, check=True
    )

    video_ids = result.stdout.strip().split("\n")
    return video_ids

channel_url = "https://www.youtube.com/..."
get_video_ids(channel_url)

Upvotes: 0

Radovici
Radovici

Reputation: 31

I do made some further improvements, to be able to add the channel URL into the console, print the result on screen and also into an external file called "_list.txt".

import scrapetube
import sys

path = '_list.txt'

print('**********************\n')
print("The result will be saved in '_list.txt' file.")
print("Enter Channel ID:")

# Prints the output in the console and into the '_list.txt' file.
class Logger:
 
    def __init__(self, filename):
        self.console = sys.stdout
        self.file = open(filename, 'w')
 
    def write(self, message):
        self.console.write(message)
        self.file.write(message)
 
    def flush(self):
        self.console.flush()
        self.file.flush()

sys.stdout = Logger(path)

# Strip the: "https://www.youtube.com/channel/"
channel_id_input = input()
channel_id = channel_id_input.strip("https://www.youtube.com/channel/")

videos = scrapetube.get_channel(channel_id)

for video in videos:
    print("https://www.youtube.com/watch?v="+str(video['videoId']))
#    print(video['videoId'])

Upvotes: 2

Radovici
Radovici

Reputation: 31

I modified the script originally posted by dermasmid to fit my needs. This is the result:

import scrapetube
import sys

path = '_list.txt'
sys.stdout = open(path, 'w')

videos = scrapetube.get_channel("UC9-y-6csu5WGm29I7JiwpnA")

for video in videos:
    print("https://www.youtube.com/watch?v="+str(video['videoId']))
#    print(video['videoId'])

Basically it is saves all the URLs from the playlist into a "_list.txt" file. I am using this "_list.txt" file to download all the videos using the yt-dlp.exe. All the downloaded files have the .mp4 extension.

Now I do need to create another "_playlist.txt" file that contains all the FILENAMES coresponding to each URL from the "_List.txt".

For example, for: "https://www.youtube.com/watch?v=yG1m7oGZC48" to have "Apple M1 Ultra & NUMA - Computerphile.mp4" as output into the "_playlist.txt"

Upvotes: 1

Derek Muller
Derek Muller

Reputation: 93

Using Selenium Chrome Driver:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import time

driverPath = ChromeDriverManager().install()

driver = webdriver.Chrome(driverPath)

url = 'https://www.youtube.com/howitshouldhaveended/videos'

driver.get(url)

height = driver.execute_script("return document.documentElement.scrollHeight")
previousHeight = -1

while previousHeight < height:
    previousHeight = height
    driver.execute_script(f'window.scrollTo(0,{height + 10000})')
    time.sleep(1)
    height = driver.execute_script("return document.documentElement.scrollHeight")

vidElements = driver.find_elements_by_id('thumbnail')
vid_urls = []
for v in vidElements:
    vid_urls.append(v.get_attribute('href'))

This code has worked the few times I've tried it; however, you might need to tweak the sleep time, or add a way to recognize when the browser is still loading the extra information. It easily worked for me for getting a channel with 300+ videos, but it was having an issue with one that had 7000+ videos due to the time required to load the new videos on the browser becoming inconsistent.

Upvotes: 1

Gajendra D Ambi
Gajendra D Ambi

Reputation: 4233

Independent way of doing things. No api, no rate limit.

import requests
username = "marquesbrownlee"
url = "https://www.youtube.com/user/username/videos"
page = requests.get(url).content
data = str(page).split(' ')
item = 'href="/watch?'
vids = [line.replace('href="', 'youtube.com') for line in data if item in line] # list of all videos listed twice
print(vids[0]) # index the latest video

This above code will scrap only limited number of video url's max upto 60. How to grab all the videos url which is present in the channel. Can you please suggest.

This above code snippet will display only the list of all the videos which is listed twice. Not all the video url's in the channel.

Upvotes: 4

Stian
Stian

Reputation: 784

After the youtube API change, max k.'s answer does not work. As a replacement, the function below provides a list of the youtube videos in a given channel. Please note that you need an API Key for it to work.

import urllib
import json

def get_all_video_in_channel(channel_id):
    api_key = YOUR API KEY

    base_video_url = 'https://www.youtube.com/watch?v='
    base_search_url = 'https://www.googleapis.com/youtube/v3/search?'

    first_url = base_search_url+'key={}&channelId={}&part=snippet,id&order=date&maxResults=25'.format(api_key, channel_id)

    video_links = []
    url = first_url
    while True:
        inp = urllib.urlopen(url)
        resp = json.load(inp)

        for i in resp['items']:
            if i['id']['kind'] == "youtube#video":
                video_links.append(base_video_url + i['id']['videoId'])

        try:
            next_page_token = resp['nextPageToken']
            url = first_url + '&pageToken={}'.format(next_page_token)
        except:
            break
    return video_links

Upvotes: 18

dSebastien
dSebastien

Reputation: 2023

Based on the code found here and at some other places, I've written a small script that does this. My script uses v3 of Youtube's API and does not hit against the 500 results limit that Google has set for searches.

The code is available over at GitHub: https://github.com/dsebastien/youtubeChannelVideosFinder

Upvotes: 7

max k.
max k.

Reputation: 569

Increase max-results from 1 to however many you want, but beware they don't advise grabbing too many in one call and will limit you at 50 (https://developers.google.com/youtube/2.0/developers_guide_protocol_api_query_parameters).

Instead you could consider grabbing the data down in batches of 25, say, by changing the start-index until none came back.

EDIT: Here's the code for how I would do it

import urllib, json
author = 'Youtube_Username'

foundAll = False
ind = 1
videos = []
while not foundAll:
    inp = urllib.urlopen(r'http://gdata.youtube.com/feeds/api/videos?start-index={0}&max-results=50&alt=json&orderby=published&author={1}'.format( ind, author ) )
    try:
        resp = json.load(inp)
        inp.close()
        returnedVideos = resp['feed']['entry']
        for video in returnedVideos:
            videos.append( video ) 

        ind += 50
        print len( videos )
        if ( len( returnedVideos ) < 50 ):
            foundAll = True
    except:
        #catch the case where the number of videos in the channel is a multiple of 50
        print "error"
        foundAll = True

for video in videos:
    print video['title'] # video title
    print video['link'][0]['href'] #url

Upvotes: 13

Related Questions