Reputation: 480
My website needs to get transcripts (captions) from youtube videos. According to the API documentation this is supposed to work by:
Both these methods are supposed to require authorization with one of these scopes:
https://www.googleapis.com/auth/youtube.force-ssl https://www.googleapis.com/auth/youtubepartner
In reality, the list method works just fine with just a regular youtube api key. No OAuth2 needed. The download method returns a 401 Unauthorized.
This seems super strange, since the captions are publicly available, you can use scrapers to get them from video pages (not a viable solution for us), so why can't I just get that info like any other video data? And why can I use the 'list' method freely, and not the download, even though the docs say both require authorization?
Three years ago someone from google answered a similar question promising this feature should be available. There are many other older questions about this topic, all with inconclusive answers, or workarounds. On the support page for the api they say to ask here with the appropriate tags, so here I am, hoping for an answer.
Upvotes: 4
Views: 1737
Reputation: 2311
This information is indeed unavailable to users that don't own those videos via the YouTube API.
From the YouTube API docs:
403 Forbidden: The permissions associated with the request are not sufficient to download the caption track. The request might not be properly authorized, or the video order might not have enabled third-party contributions for this caption.
I ended up creating a scraper based of an older example, as that one doesn't work as of 2022.
Mine uses selenium-wire
to fetch the captions from the network request to timedtext
.
Upvotes: 0