Schema/DTD for Youtube json3 transcript format

I used yt-dlp --skip-download --write-auto-sub --sub-format json3 $target_url to download a Youtube video's captions/transcript. The JSON starts with this key-value:

wireMagic: pb3

Is there a schema definition (JSON Schema, DTD, etc) for this wireMagic format?

I Googled it and also searched the yt-dlp and youtube-dl Github projects. But found no details.

The transcript's JSON body contains many cryptic keys, such as wsWinStyles, mhModeHint, juJustifCode, sdScrollDir, wpWinPositions, apPoint, ahHorPos, avVerPos etc, which I'd like to understand.

Is it a Jsonified version of a web or pre-web captioning standard? Or maybe, is it an internal Google/Youtube caption format?

Upvotes: 7

Answers (2)

aron shuvax

Reputation: 144

It doesn't seem to be a standard because it's not written anywhere, It's an internal format of subtitles in Google/YouTube.

Upvotes: 1

MadaraUchiha

Reputation: 442

Yes, the yt-dlp and youtube-dl are internal Google/YouTube caption formats and there is no public documentation as I understand from yt-dlp and youtube-dl GitHub projects.

It seems that wsWinStyles, mhModeHint, and juJustifCode are internal codes for styling and positioning elements.

I would try WebVTT (.vtt) or SubRip (.srt) caption formats with yt-dlp. As it's more standardized and well-documented.

https://github.com/yt-dlp/yt-dlp/blob/master/README.md

Upvotes: -1

Schema/DTD for Youtube json3 transcript format

Answers (2)

Related Questions