Programmatically, how does Youtube Content ID work?

Youtube Content ID system is a mechanism where a content creator (typically a company) can upload their original copyrighted video to Youtube and then Youtube searchs its entire collection of videos that have been uploaded by end users to determine if a user has uploaded copyrighted content not authorized by the content creator, as explained here.

What's most interesting to me is the claim that Content ID can find copyrighted videos even if the end user who uploaded copyrighted material changed the video resolution or the end user uploaded just a subset of the original copyrighted video content.

How do you programmically do this ... because it's not as simply as just saying

OriginalVideo == UploadedVideo

Lower end user video resolutions introduces lots of artificats which would make matching unauthorized uploads of copyright video harder to find ... as well as if only a small portion of the copyrighted video was used (e.g. 3 seconds of a 10 minute video) introduce difficulties in match.

How do you solve this program programmically?

Upvotes: 2

Answers (2)

jdhao

Reputation: 28449

This has something to do video fingerprinting. Video is cut into frames. We then extract the so-called features from these frames (you can think these features as fix-length vectors). When a new video comes, we repeat this process. We can then find similar frames based on frame features from the database of video frame features. Based on that, we can further find similar videos to the query video (for example, based on how many similar frames they have in common).

Upvotes: 0

szatmary

Reputation: 31110

Nobody outside of google can say for sure what exactly content id is doing. But it is almost certainly some form of digital fingerprinting ( uasually using an FFT ) or perceptual hash.

Upvotes: 1

Programmatically, how does Youtube Content ID work?

Answers (2)

Related Questions