Reputation: 131
I process H.264 RTP stream from an IP-camera. The camera I use splits each I-frame to several NAL units, each of those is splitted to RTP packets (the begin and the end flags determine the dimensions of each unit, not of the frame).
How can I know when the frame transmission is finished and I have enough data to decompress it? As the frame consists of several units - the flags cannot be used to determine its end.
Most cameras I worked split each frame to RTP packets where the flags determine the start and the end of the frame. So I unpack data from this packets waiting for the end flag - and here is a complete frame.
The sequence of NAL units that I get from this camera is:
[NAL_UT_SPS] Sequence Parameter Set +
[NAL_UT_PPS] Picture Parameter Set
[NAL_UT_SEI] Supplemental Enhancement Information
[NAL_UT_IDR_SLICE] Part #1 of the I-frame picture data
[NAL_UT_IDR_SLICE] Part #2 of the I-frame picture data
[NAL_UT_IDR_SLICE] Part #3 of the I-frame picture data
[NAL_UT_SLICE] 1st P-frame
[NAL_UT_SLICE] 2nd P-frame
[NAL_UT_SLICE] 3rd P-frame
...
From this sequence it is obvious that I can combine [NAL_UT_SPS] + [NAL_UT_PPS] + [NAL_UT_SEI] + 3*[NAL_UT_IDR_SLICE] into one I-frame that I will lately feed to the decoder. But how can I determine how many picture data parts will? How can I know when I've received part #X that it is not the last in the sequence?
Any ideas?
Upvotes: 1
Views: 3922
Reputation: 137
RTP defines a Marker bit in the RPT Header which signals the end of an access unit of the same RTP timestamp. If the Marker bit is set, it is the last NALU for this particular RTP timestamp.
If you are using the Marker bit, you do not need to wait for the next access unit to arrive, thus minimizing latency.
You can read more about the Marker bit in the RFC for the H264 Payload in Section 5.1. page 9.
Upvotes: 1
Reputation: 131
I solved the problem.
The solution was: attach to the start of the frame all non-picture units (NAL_UT_SPS, NAL_UT_PPS, NAL_UT_SEI in the example above) and for picture-containing packets (NAL_UT_IDR_SLICE, NAL_UT_SLICE) check the first_mb_in_slice field (which is equal to 0 for the first slice of picture data and not equal for the 2nd, 3rd..).
So if first_mb_in_slice==0 and the buffer contains picture data then return it and write new frame data to the buffer, otherwise only append the data without returning the frame. This way we return the frame #1 when we start to receive the frame #2 and can determine that is is a new frame, not the part of the previous one:
[NAL_UT_SPS] frame #1 (I) starts
[NAL_UT_PPS] frame #1 continues
[NAL_UT_SEI] frame #1 continues
[NAL_UT_IDR_SLICE] Frame #1 picture data, Part #1: first_mb_in_slice == 0
[NAL_UT_IDR_SLICE] Frame #1 picture data, Part #2: first_mb_in_slice > 0
[NAL_UT_IDR_SLICE] Frame #1 picture data, Part #3: first_mb_in_slice > 0
[NAL_UT_SLICE] frame #2 (P) starts: first_mb_in_slice == 0 <- at this point we will return the 1st frame
[NAL_UT_SLICE] frame #3 (P) starts: first_mb_in_slice == 0 <- return the 2nd frame
[NAL_UT_SLICE] frame #4 (P) starts: first_mb_in_slice == 0 <- return the 3rd frame
[NAL_UT_SPS] frame #5 (I) starts <- return the 4th frame
...
Upvotes: 5
Reputation: 8264
Most H.264 decoders accept the input stream as NALs. Unless you have a picky decoder - I would just feed the NALs into the decoder. In general there is no guarantee for 1:1 relationship between NAL:frame or even slice.
Upvotes: 0