Reputation: 1502
This is more of a conceptual question than a technical one. My understanding of H.264 is that it relies on past and future frames in order to compress video data. Its trivial to take a fully compressed H.264 video file and stream it via RTP or any other protocol of your choice, however, how would this work with real time video? In real time video you only have access to past and current frames and don't know the full length of the video, so how can the H.264 codec actually compress the video and prepare it to be an RTP payload? Does it simply buffer and chunk the video into an arbitrarily sized smaller video and compress that? The only way I can think of this working is to split the video into something like 1 second chunks, compress those as individual videos, and make them the RTP payload. Is this how its done or is there more "magic" happening than I suspect?
Upvotes: 6
Views: 2261
Reputation: 1
Im not an expert, but I have just dabbled in it just yesterday and today system calling ffmpeg in 5 frame chunks, it was slow but just real time at about 1.5 spf.
I think the thing I'm doing wrong (besides from doing system calls and not having a proper library in the language directly.) is im recalling it every new frame with 5 new frames to compress.
I think its better if u do say 10 frames, then overlap the calls only every 5th frame instead of every frame, then u only repeat compress each frame once (so you compress all the frames of the video twice only) instead of 5 times, if I was calling 5 frames every frame.
Upvotes: 0
Reputation: 31110
First, there are three types of frames.
I (Intra) frames, or keyframes. These frame do not reference any other frames. They are standalone, and can be decoded without any other frame data. Like a JPEG.
P (Predecitve) frame. Can reference frames from the past.
B (bi directional) Can reference frames from the past, or the future.
Option 1. Only use I and P frames. This causes the file to be about 10 - 15% larger (or 10-15% lower quality at the same file size). This is used for interactive systems like video conferencing and screen sharing where latency is very noticeable.
Option 2, wait for the future to happen. at 30 frames per second the future will be here in 33 milliseconds.
h.264 specifically can only reference up to 16 neighboring frames. However most people limit this to around 4. So to wait for 4 frames is only about 133 millisecond delay.
Upvotes: 9