Reputation: 41
I'm having trouble searching links on youtube, I tried google-api-python-client, along with the sample codes converted to python3, I tried to use the code below, the youtube search module hosted on pypi, I checked the key API, and it wasn't her. The code returns a bs4 error that I tried to solve, when I solve it it returns an empty list, with no link.
Can anyone tell me a way to do a search on youtube and get the links using Python? Thank you very much for your help, I am using Python 3.8.3 and Windows 10 x86.
youtube-search module: https://pypi.org/project/youtube-search/
samples code: https://developers.google.com/youtube/v3/code_samples/python?hl=pt-br (Python 2.x, I'm tryed a conversion to python3.x)
My code with bs4 and urllib:
import urllib.request
from bs4 import BeautifulSoup
textToSearch = 'hello world'
query = urllib.parse.quote(textToSearch)
url = "https://www.youtube.com/results?search_query=" + query
response = urllib.request.urlopen(url)
html = response.read()
soup = BeautifulSoup(html, 'html.parser')
print(soup.findAll(attrs={"class": "yt-uix-tile-link"}))
for vid in soup.findAll(attrs={'class':'yt-uix-tile-link'}):
print('https://www.youtube.com' + vid['href'])
Upvotes: 3
Views: 19960
Reputation: 1724
An answer from Andrej Kesely with a manual solution no longer works and returns an empty list
- []
. It's because YouTube is rendered via JavaScript and beautifulsoup
can't scrape JavaScript.
To scrape YouTube manually you need to use either regex
and extract data from page source via regex
or use selenium
.
The following code will scrape all Youtube results until it hits "No more results" at the very bottom of the page.
Code and full example that scrapes and shows more:
from selenium import webdriver
import time
def get_video_results():
driver = webdriver.Chrome()
driver.get('https://www.youtube.com/results?search_query=minecraft')
youtube_data = []
# scrolling to the end of the page
# https://stackoverflow.com/a/57076690/15164646
while True:
# end_result = "No more results" string at the bottom of the page
# this will be used to break out of the while loop
end_result = driver.find_element_by_css_selector('#message').is_displayed()
driver.execute_script("var scrollingElement = (document.scrollingElement || document.body);scrollingElement.scrollTop = scrollingElement.scrollHeight;")
# time.sleep(1) # could be removed
# once the element is located, break out of the loop
if end_result:
break
print('Extracting results. It might take a while...')
# iterate over all elements and extract link
for result in driver.find_elements_by_css_selector('.text-wrapper.style-scope.ytd-video-renderer'):
link = result.find_element_by_css_selector('.title-and-badge.style-scope.ytd-video-renderer a').get_attribute('href')
get_video_results()
# prints all found links
Alternatively, you can use YouTube Video Results API from SerpApi. It's a paid API with a free plan.
The main difference, in this case, is that you don't have to deal with the Javascript page and figure out how to scrape data from the page source especially if you need fast output.
The following code doesn't scrape all video results but YouTube Video Results API has support for continuous pagination and async.
Code to integrate:
from serpapi import GoogleSearch
import os
params = {
"api_key": os.getenv("API_KEY"),
"engine": "youtube",
"search_query": "minecraft"
}
search = GoogleSearch(params)
results = search.get_dict()
for results in results['video_results']:
link = results['link']
# https://www.youtube.com/watch?v=hjV30hf6yEM
# ... other links
P.S - I wrote two blog posts about how to scrape YouTube Search Video Results (video, channel, ad results) and another blog post about how to scrape playlist, movie, category results from YouTube Search.
Disclaimer, I work for SerpApi.
Upvotes: 1
Reputation: 11
No current code on this page works currently as of 11/03/2022, I have however edited and updated the code to now work.
To get the first 24 results
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
def get_video_results():
driver = webdriver.Chrome("path/to/chromedriver.exe")
driver.get('https://www.youtube.com/results?search_query=minecraft')
driver.execute_script("var scrollingElement = (document.scrollingElement || document.body);scrollingElement.scrollTop = scrollingElement.scrollHeight;")
youtube_data = []
# iterate over all elements and extract link
for result in driver.find_elements(By.CSS_SELECTOR, '.text-wrapper.style-scope.ytd-video-renderer'):
link = result.find_element(By.CSS_SELECTOR, '.title-and-badge.style-scope.ytd-video-renderer a').get_attribute('href')
youtube_data.append(link)
return youtube_data
print(get_video_results())
Upvotes: 1
Reputation: 613
This is how I implemented for getting links from search result in last hour.
from youtubesearchpython import *
customSearch = CustomSearch('Your Keyword Here', VideoUploadDateFilter.lastHour, limit = 20)
for i in range(20):
print(customSearch.result()['result'][i]['link'])
For Reference : Youtube Search Python without Data Api v3
Upvotes: 3
Reputation: 435
There is a similar module for your requirement (supports both async & sync):
https://github.com/alexmercerind/youtube-search-python
You can use it in following way:
from youtubesearchpython import VideosSearch
videosSearch = VideosSearch('NoCopyrightSounds', limit = 2)
print(videosSearch.result())
{
"result": [
{
"type": "video",
"id": "K4DyBUG242c",
"title": "Cartoon - On & On (feat. Daniel Levi) [NCS Release]",
"publishedTime": "5 years ago",
"duration": "3:28",
"viewCount": {
"text": "389,673,774 views",
"short": "389M views"
},
"thumbnails": [
{
"url": "https://i.ytimg.com/vi/K4DyBUG242c/hqdefault.jpg?sqp=-oaymwEjCOADEI4CSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLBkTusCwcZQlmVAaRQ5rH-mvBuA1g",
"width": 480,
"height": 270
}
],
"descriptionSnippet": [
{
"text": "NCS: Music Without Limitations NCS Spotify: http://spoti.fi/NCS Free Download / Stream: http://ncs.io/onandon \u25bd Connect with\u00a0..."
}
],
"channel": {
"name": "NoCopyrightSounds",
"id": "UC_aEa8K-EOJ3D6gOs7HcyNg",
"thumbnails": [
{
"url": "https://yt3.ggpht.com/a-/AOh14GhS0G5FwV8rMhVCUWSDp36vWEvnNs5Vl97Zww=s68-c-k-c0x00ffffff-no-rj-mo",
"width": 68,
"height": 68
}
],
"link": "https://www.youtube.com/channel/UC_aEa8K-EOJ3D6gOs7HcyNg"
},
"accessibility": {
"title": "Cartoon - On & On (feat. Daniel Levi) [NCS Release] by NoCopyrightSounds 5 years ago 3 minutes, 28 seconds 389,673,774 views",
"duration": "3 minutes, 28 seconds"
},
"link": "https://www.youtube.com/watch?v=K4DyBUG242c",
"shelfTitle": null
},
{
"type": "video",
"id": "yJg-Y5byMMw",
"title": "Warriyo - Mortals (feat. Laura Brehm) [NCS Release]",
"publishedTime": "3 years ago",
"duration": "3:50",
"viewCount": {
"text": "153,353,801 views",
"short": "153M views"
},
"thumbnails": [
{
"url": "https://i.ytimg.com/vi/yJg-Y5byMMw/hqdefault.jpg?sqp=-oaymwEjCOADEI4CSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLDY-mve79IweErMo-71AsKEIB1m0A",
"width": 480,
"height": 270
}
],
"descriptionSnippet": [
{
"text": "NCS: Music Without Limitations NCS Spotify: http://spoti.fi/NCS Free Download / Stream: http://ncs.io/mortals Connect with NCS:\u00a0..."
}
],
"channel": {
"name": "NoCopyrightSounds",
"id": "UC_aEa8K-EOJ3D6gOs7HcyNg",
"thumbnails": [
{
"url": "https://yt3.ggpht.com/a-/AOh14GhS0G5FwV8rMhVCUWSDp36vWEvnNs5Vl97Zww=s68-c-k-c0x00ffffff-no-rj-mo",
"width": 68,
"height": 68
}
],
"link": "https://www.youtube.com/channel/UC_aEa8K-EOJ3D6gOs7HcyNg"
},
"accessibility": {
"title": "Warriyo - Mortals (feat. Laura Brehm) [NCS Release] by NoCopyrightSounds 3 years ago 3 minutes, 50 seconds 153,353,801 views",
"duration": "3 minutes, 50 seconds"
},
"link": "https://www.youtube.com/watch?v=yJg-Y5byMMw",
"shelfTitle": null
}
]
}
The search result from this library is very detailed.
Upvotes: 3
Reputation: 618
Another module is there. Searchtube
pip install searchtube
from searchtube import Search
print(Search('hello',filter='udlh').results
This module also contains filter options in addition to other modules. Give it a try. More information at https://github.com/shekharchander/searchtube
Upvotes: 1
Reputation: 195438
To get correct response from YouTube, set correct User-Agent
HTTP header.
For example:
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'}
textToSearch = 'hello world'
url = 'https://www.youtube.com/results'
response = requests.get(url, params={'search_query': textToSearch}, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
print(soup.findAll(attrs={"class": "yt-uix-tile-link"}))
for vid in soup.findAll(attrs={'class':'yt-uix-tile-link'}):
print('https://www.youtube.com' + vid['href'])
Prints:
[<a aria-describedby="description-id-498021" class="yt-uix-tile-link yt-ui-ellipsis yt-ui-ellipsis-2 yt-uix-sessionlink spf-link" data-sessionlink="itct=CFsQ3DAYACITCLSOm7bRu-oCFQhO4AodS0kHqzIGc2VhcmNoUgtoZWxsbyB3b3JsZJoBAxD0JA" dir="ltr" href="/watch?v=Yw6u6YkTgQ4" rel="spf-prefetch" title="hello world">hello world</a>, <a aria-describedby="description-id-20311" class="yt-uix-tile-link yt-ui-ellipsis yt-ui-ellipsis-2 yt-uix-sessionlink spf-link" data-sessionlink="itct=CFoQ3DAYASITCLSOm7bRu-oCFQhO4AodS0kHqzIGc2VhcmNoUgtoZWxsbyB3b3JsZJoBAxD0JA" dir="ltr" href="/watch?v=al2DFQEZl4M" rel="spf-prefetch" title="Lady Antebellum - Hello World">Lady Antebellum - Hello World</a>, ...
https://www.youtube.com/watch?v=Yw6u6YkTgQ4
https://www.youtube.com/watch?v=al2DFQEZl4M
https://www.youtube.com/watch?v=OfaBZvvL_7M
https://www.youtube.com/watch?v=rOU4YiuaxAM
https://www.youtube.com/watch?v=MF5qMW6AIvo
https://www.youtube.com/watch?v=zeQTrWU1RlU&list=PLqq4LnWs3olU-bP2R9uD8YXbt02JjocOk
https://www.youtube.com/watch?v=mFrghyAyNTg
https://www.youtube.com/watch?v=82vOw3l2DmY
https://www.youtube.com/watch?v=GxPNprgqR48
... and so on.
Upvotes: 1