Reputation: 2544
I have been trying to understand how video encoding works for modern encoders, in particular H264. It is very often mentioned in documentation that residual frames are created from the differences between the current p-frame and the last i-frame (assuming the following frames are not used in the prediction). I understand that a YUV color space is used (maybe YV12), and that one image is "substracted" from the other and then the residual is formed. What I don't understand is how exactly this substraction works. I don't think it is an absolute value of the difference because that would be ambiguous. What is the per pixel formula to obtain this difference?
Upvotes: 6
Views: 2942
Reputation: 1521
Subtraction is just one small step in video encoding; the core principle behind most modern video encoding is motion estimation, followed by motion compensation. Basically, the process of motion estimation generates vectors that show offsets between macroblocks in successive frames. However, there's always a bit of error in these vectors.
So what happens is the encoder will output both the vector offsets, and the "residual" is what's left. The residual is not simply the difference between two frames; it's the difference between the two frames after motion estimation is taken into account. See the "Motion compensated difference" image in the wikipedia article on compensation for a clear illustration of this--note that the motion compensated difference is drastically smaller than the "dumb" residual.
Here's a decent PDF that goes over some of the basics.
A few other notes:
Upvotes: 6