Reputation: 143
Is it possible to pull the auto (non-user) generated video transcripts from any of the YouTube APIs?
Upvotes: 12
Views: 16516
Reputation: 1
There are multiple clients out there that fetches youtube transcript, I recommand:
However, Youtube banned all cloud providers IPs, so if you are looking for a production ready tool, I recommand using an API. The cheapest option I found yet is on RapidAPI:
https://rapidapi.com/invideoiq-invideoiq-default/api/video-transcript-scraper
You get 50 free transcripts, and 500 000 for just 9$, it retieves transcript in less than a second and it retrieves transcript from all major plateforms (Youtube, X, TikTok, Facebook, etc.)
Upvotes: 0
Reputation: 85
As of Aug 2019 the following method you to download transcripts:
Open in Browser:
https://www.youtube.com/watch?v=[Video ID]
From Console type:
JSON.parse(ytplayer.config.args.player_response).captions.playerCaptionsTracklistRenderer.captionTracks[0].baseUrl
UPDATE: As of Jan 2025, JSON.parse
is no longer needed
ytplayer.config.args.raw_player_response.captions.playerCaptionsTracklistRenderer.captionTracks[0].baseUrl
Upvotes: 7
Reputation: 606
1 Install youtube-transcript-api
(https://github.com/jdepoix/youtube-transcript-api), e.g.:
pip3 install youtube_transcript_api
2 Create youtube_transcript_api-wrapper.py
with the following code (based partially on https://stackoverflow.com/a/65325576/2585501):
from youtube_transcript_api import YouTubeTranscriptApi
#srt = YouTubeTranscriptApi.get_transcript(video_id)
videoListName = "youtubeVideoIDlist.txt"
with open(videoListName) as f:
video_ids = f.read().splitlines()
transcript_list, unretrievable_videos = YouTubeTranscriptApi.get_transcripts(video_ids, continue_after_error=True)
for video_id in video_ids:
if video_id in transcript_list.keys():
print("\nvideo_id = ", video_id)
#print(transcript)
srt = transcript_list.get(video_id)
text_list = []
for i in srt:
text_list.append(i['text'])
text = ' '.join(text_list)
print(text)
3 Create youtubeVideoIDlist.txt
containing a list of video_ids
4 python3 youtube_transcript_api-wrapper.py
Upvotes: 4
Reputation: 13469
You may refer with this thread: How to get "transcript" in youtube-api v3
If you're authenticating with oAuth2, you could do a quick call to this feed:
http://gdata.youtube.com/feeds/api/videos/[VIDEOID]/captiondata/[CAPTIONTRACKID]
to get the data you want. To retrieve a list of possible caption track IDs with v2 of the API, you access this feed:
https://gdata.youtube.com/feeds/api/videos/[VIDEOID]/captions
That feed request also accepts some optional parameters, including language, max-results, etc. For more details, along with a sample that shows the returned format of the caption track list, see the documentation at https://developers.google.com/youtube/2.0/developers_guide_protocol_captions#Retrieve_Caption_Set
Also, here are some references which migh help:
Upvotes: 4