Reputation: 1620
I used yt-dlp --skip-download --write-auto-sub --sub-format json3 $target_url
to download a Youtube video's captions/transcript. The JSON starts with this key-value:
wireMagic: pb3
Is there a schema definition (JSON Schema, DTD, etc) for this wireMagic
format?
I Googled it and also searched the yt-dlp and youtube-dl Github projects. But found no details.
The transcript's JSON body contains many cryptic keys, such as wsWinStyles, mhModeHint, juJustifCode, sdScrollDir, wpWinPositions, apPoint, ahHorPos, avVerPos
etc, which I'd like to understand.
Is it a Jsonified version of a web or pre-web captioning standard? Or maybe, is it an internal Google/Youtube caption format?
Upvotes: 7
Views: 382
Reputation: 144
It doesn't seem to be a standard because it's not written anywhere, It's an internal format of subtitles in Google/YouTube.
Upvotes: 1
Reputation: 442
Yes, the yt-dlp
and youtube-dl
are internal Google/YouTube caption formats and there is no public documentation as I understand from yt-dlp
and youtube-dl
GitHub projects.
It seems that wsWinStyles
, mhModeHint
, and juJustifCode
are internal codes for styling and positioning elements.
I would try WebVTT (.vtt)
or SubRip (.srt)
caption formats with yt-dlp
. As it's more standardized and well-documented.
https://github.com/yt-dlp/yt-dlp/blob/master/README.md
Upvotes: -1