Pierre de Fermat
Pierre de Fermat

Reputation: 41

Youtube Search on Python 3.8

I'm having trouble searching links on youtube, I tried google-api-python-client, along with the sample codes converted to python3, I tried to use the code below, the youtube search module hosted on pypi, I checked the key API, and it wasn't her. The code returns a bs4 error that I tried to solve, when I solve it it returns an empty list, with no link.

Can anyone tell me a way to do a search on youtube and get the links using Python? Thank you very much for your help, I am using Python 3.8.3 and Windows 10 x86.

youtube-search module: https://pypi.org/project/youtube-search/

samples code: https://developers.google.com/youtube/v3/code_samples/python?hl=pt-br (Python 2.x, I'm tryed a conversion to python3.x)

My code with bs4 and urllib:

import urllib.request
from bs4 import BeautifulSoup

textToSearch = 'hello world'
query = urllib.parse.quote(textToSearch)
url = "https://www.youtube.com/results?search_query=" + query
response = urllib.request.urlopen(url)
html = response.read()
soup = BeautifulSoup(html, 'html.parser')
print(soup.findAll(attrs={"class": "yt-uix-tile-link"}))

for vid in soup.findAll(attrs={'class':'yt-uix-tile-link'}):
    print('https://www.youtube.com' + vid['href'])

Upvotes: 3

Views: 19960

Answers (6)

Dmitriy Zub
Dmitriy Zub

Reputation: 1724

An answer from Andrej Kesely with a manual solution no longer works and returns an empty list - []. It's because YouTube is rendered via JavaScript and beautifulsoup can't scrape JavaScript.

To scrape YouTube manually you need to use either regex and extract data from page source via regex or use selenium.


The following code will scrape all Youtube results until it hits "No more results" at the very bottom of the page.

Code and full example that scrapes and shows more:

from selenium import webdriver
import time


def get_video_results():
    driver = webdriver.Chrome()
    driver.get('https://www.youtube.com/results?search_query=minecraft')

    youtube_data = []

    # scrolling to the end of the page
    # https://stackoverflow.com/a/57076690/15164646
    while True:
        # end_result = "No more results" string at the bottom of the page
        # this will be used to break out of the while loop
        end_result = driver.find_element_by_css_selector('#message').is_displayed()
        driver.execute_script("var scrollingElement = (document.scrollingElement || document.body);scrollingElement.scrollTop = scrollingElement.scrollHeight;")
        # time.sleep(1) # could be removed

        # once the element is located, break out of the loop
        if end_result:
            break

    print('Extracting results. It might take a while...')

    # iterate over all elements and extract link
    for result in driver.find_elements_by_css_selector('.text-wrapper.style-scope.ytd-video-renderer'):
        link = result.find_element_by_css_selector('.title-and-badge.style-scope.ytd-video-renderer a').get_attribute('href')

get_video_results()

# prints all found links

Alternatively, you can use YouTube Video Results API from SerpApi. It's a paid API with a free plan.

The main difference, in this case, is that you don't have to deal with the Javascript page and figure out how to scrape data from the page source especially if you need fast output.

The following code doesn't scrape all video results but YouTube Video Results API has support for continuous pagination and async.

Code to integrate:

from serpapi import GoogleSearch
import os

params = {
  "api_key": os.getenv("API_KEY"),
  "engine": "youtube",
  "search_query": "minecraft"
}

search = GoogleSearch(params)
results = search.get_dict()

for results in results['video_results']:
    link = results['link']

# https://www.youtube.com/watch?v=hjV30hf6yEM
# ... other links

P.S - I wrote two blog posts about how to scrape YouTube Search Video Results (video, channel, ad results) and another blog post about how to scrape playlist, movie, category results from YouTube Search.

Disclaimer, I work for SerpApi.

Upvotes: 1

Tom Sumner
Tom Sumner

Reputation: 11

No current code on this page works currently as of 11/03/2022, I have however edited and updated the code to now work.

To get the first 24 results

import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
import time


def get_video_results():
    driver = webdriver.Chrome("path/to/chromedriver.exe")
    driver.get('https://www.youtube.com/results?search_query=minecraft')
    driver.execute_script("var scrollingElement = (document.scrollingElement || document.body);scrollingElement.scrollTop = scrollingElement.scrollHeight;")
 
    youtube_data = []

    # iterate over all elements and extract link
    for result in driver.find_elements(By.CSS_SELECTOR, '.text-wrapper.style-scope.ytd-video-renderer'):
        link = result.find_element(By.CSS_SELECTOR, '.title-and-badge.style-scope.ytd-video-renderer a').get_attribute('href')
        youtube_data.append(link)

    return youtube_data

print(get_video_results())

Upvotes: 1

Faisal Afroz
Faisal Afroz

Reputation: 613

This is how I implemented for getting links from search result in last hour.

from youtubesearchpython import *

customSearch = CustomSearch('Your Keyword Here', VideoUploadDateFilter.lastHour, limit = 20)

for i in range(20):
    print(customSearch.result()['result'][i]['link'])

For Reference : Youtube Search Python without Data Api v3

Upvotes: 3

Hitesh
Hitesh

Reputation: 435

There is a similar module for your requirement (supports both async & sync):

https://github.com/alexmercerind/youtube-search-python

You can use it in following way:

Example

from youtubesearchpython import VideosSearch

videosSearch = VideosSearch('NoCopyrightSounds', limit = 2)

print(videosSearch.result())

Result

{
    "result": [
        {
            "type": "video",
            "id": "K4DyBUG242c",
            "title": "Cartoon - On & On (feat. Daniel Levi) [NCS Release]",
            "publishedTime": "5 years ago",
            "duration": "3:28",
            "viewCount": {
                "text": "389,673,774 views",
                "short": "389M views"
            },
            "thumbnails": [
                {
                    "url": "https://i.ytimg.com/vi/K4DyBUG242c/hqdefault.jpg?sqp=-oaymwEjCOADEI4CSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLBkTusCwcZQlmVAaRQ5rH-mvBuA1g",
                    "width": 480,
                    "height": 270
                }
            ],
            "descriptionSnippet": [
                {
                    "text": "NCS: Music Without Limitations NCS Spotify: http://spoti.fi/NCS Free Download / Stream: http://ncs.io/onandon \u25bd Connect with\u00a0..."
                }
            ],
            "channel": {
                "name": "NoCopyrightSounds",
                "id": "UC_aEa8K-EOJ3D6gOs7HcyNg",
                "thumbnails": [
                    {
                        "url": "https://yt3.ggpht.com/a-/AOh14GhS0G5FwV8rMhVCUWSDp36vWEvnNs5Vl97Zww=s68-c-k-c0x00ffffff-no-rj-mo",
                        "width": 68,
                        "height": 68
                    }
                ],
                "link": "https://www.youtube.com/channel/UC_aEa8K-EOJ3D6gOs7HcyNg"
            },
            "accessibility": {
                "title": "Cartoon - On & On (feat. Daniel Levi) [NCS Release] by NoCopyrightSounds 5 years ago 3 minutes, 28 seconds 389,673,774 views",
                "duration": "3 minutes, 28 seconds"
            },
            "link": "https://www.youtube.com/watch?v=K4DyBUG242c",
            "shelfTitle": null
        },
        {
            "type": "video",
            "id": "yJg-Y5byMMw",
            "title": "Warriyo - Mortals (feat. Laura Brehm) [NCS Release]",
            "publishedTime": "3 years ago",
            "duration": "3:50",
            "viewCount": {
                "text": "153,353,801 views",
                "short": "153M views"
            },
            "thumbnails": [
                {
                    "url": "https://i.ytimg.com/vi/yJg-Y5byMMw/hqdefault.jpg?sqp=-oaymwEjCOADEI4CSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLDY-mve79IweErMo-71AsKEIB1m0A",
                    "width": 480,
                    "height": 270
                }
            ],
            "descriptionSnippet": [
                {
                    "text": "NCS: Music Without Limitations NCS Spotify: http://spoti.fi/NCS Free Download / Stream: http://ncs.io/mortals Connect with NCS:\u00a0..."
                }
            ],
            "channel": {
                "name": "NoCopyrightSounds",
                "id": "UC_aEa8K-EOJ3D6gOs7HcyNg",
                "thumbnails": [
                    {
                        "url": "https://yt3.ggpht.com/a-/AOh14GhS0G5FwV8rMhVCUWSDp36vWEvnNs5Vl97Zww=s68-c-k-c0x00ffffff-no-rj-mo",
                        "width": 68,
                        "height": 68
                    }
                ],
                "link": "https://www.youtube.com/channel/UC_aEa8K-EOJ3D6gOs7HcyNg"
            },
            "accessibility": {
                "title": "Warriyo - Mortals (feat. Laura Brehm) [NCS Release] by NoCopyrightSounds 3 years ago 3 minutes, 50 seconds 153,353,801 views",
                "duration": "3 minutes, 50 seconds"
            },
            "link": "https://www.youtube.com/watch?v=yJg-Y5byMMw",
            "shelfTitle": null
        }
    ]
}

The search result from this library is very detailed.

Upvotes: 3

shekhar chander
shekhar chander

Reputation: 618

Another module is there. Searchtube

pip install searchtube
from searchtube import Search
print(Search('hello',filter='udlh').results

This module also contains filter options in addition to other modules. Give it a try. More information at https://github.com/shekharchander/searchtube

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195438

To get correct response from YouTube, set correct User-Agent HTTP header.

For example:

import requests
from bs4 import BeautifulSoup


headers = {'User-Agent': 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'}

textToSearch = 'hello world'
url = 'https://www.youtube.com/results'

response = requests.get(url, params={'search_query': textToSearch}, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')

print(soup.findAll(attrs={"class": "yt-uix-tile-link"}))
for vid in soup.findAll(attrs={'class':'yt-uix-tile-link'}):
    print('https://www.youtube.com' + vid['href'])

Prints:

[<a aria-describedby="description-id-498021" class="yt-uix-tile-link yt-ui-ellipsis yt-ui-ellipsis-2 yt-uix-sessionlink spf-link" data-sessionlink="itct=CFsQ3DAYACITCLSOm7bRu-oCFQhO4AodS0kHqzIGc2VhcmNoUgtoZWxsbyB3b3JsZJoBAxD0JA" dir="ltr" href="/watch?v=Yw6u6YkTgQ4" rel="spf-prefetch" title="hello world">hello world</a>, <a aria-describedby="description-id-20311" class="yt-uix-tile-link yt-ui-ellipsis yt-ui-ellipsis-2 yt-uix-sessionlink spf-link" data-sessionlink="itct=CFoQ3DAYASITCLSOm7bRu-oCFQhO4AodS0kHqzIGc2VhcmNoUgtoZWxsbyB3b3JsZJoBAxD0JA" dir="ltr" href="/watch?v=al2DFQEZl4M" rel="spf-prefetch" title="Lady Antebellum - Hello World">Lady Antebellum - Hello World</a>, ...
https://www.youtube.com/watch?v=Yw6u6YkTgQ4
https://www.youtube.com/watch?v=al2DFQEZl4M
https://www.youtube.com/watch?v=OfaBZvvL_7M
https://www.youtube.com/watch?v=rOU4YiuaxAM
https://www.youtube.com/watch?v=MF5qMW6AIvo
https://www.youtube.com/watch?v=zeQTrWU1RlU&list=PLqq4LnWs3olU-bP2R9uD8YXbt02JjocOk
https://www.youtube.com/watch?v=mFrghyAyNTg
https://www.youtube.com/watch?v=82vOw3l2DmY
https://www.youtube.com/watch?v=GxPNprgqR48

... and so on.

Upvotes: 1

Related Questions