Reputation: 457
I am implementing a (very) low latency video streaming C++ application using ffmpeg. The client receives a video which is encoded with x264’s zerolatency preset, so there is no need for buffering. As described here, if you use av_read_frame() to read packets of the encoded video stream, you will always have at least one frame delay because of internal buffering done in ffmpeg. So when I call av_read_frame() after frame n+1 has been sent to the client, the function will return frame n.
Getting rid of this buffering by setting the AVFormatContext flags AVFMT_FLAG_NOPARSE | AVFMT_FLAG_NOFILLIN as suggested in the source disables packet parsing and therefore breaks decoding, as noted in the source.
Therefore, I am writing my own packet receiver and parser. First, here are the relevant steps of the working solution (including one frame delay) using av_read_frame():
AVFormatContext *fctx;
AVCodecContext *cctx;
AVPacket *pkt;
AVFrame *frm;
//Initialization of AV structures
//…
//Main Loop
while(true){
//Receive packet
av_read_frame(fctx, pkt);
//Decode:
avcodec_send_packet(cctx, pkt);
avcodec_receive_frame(cctx, frm);
//Display frame
//…
}
And below is my solution, which mimics the behavior of av_read_frame(), as far as I could reproduce it. I was able to track the source code of av_read_frame() down to ff_read_packet(),but I cannot find the source of AVInputformat.read_packet().
int tcpsocket;
AVCodecContext *cctx;
AVPacket *pkt;
AVFrame *frm;
uint8_t recvbuf[(int)10e5];
memset(recvbuf,0,10e5);
int pos = 0;
AVCodecParserContext * parser = av_parser_init(AV_CODEC_ID_H264);
parser->flags |= PARSER_FLAG_COMPLETE_FRAMES;
parser->flags |= PARSER_FLAG_USE_CODEC_TS;
//Initialization of AV structures and the tcpsocket
//…
//Main Loop
while(true){
//Receive packet
int length = read(tcpsocket, recvbuf, 10e5);
if (length >= 0) {
//Creating temporary packet
AVPacket * tempPacket = new AVPacket;
av_init_packet(tempPacket);
av_new_packet(tempPacket, length);
memcpy(tempPacket->data, recvbuf, length);
tempPacket->pos = pos;
pos += length;
memset(recvbuf,0,length);
//Parsing temporary packet into pkt
av_init_packet(pkt);
av_parser_parse2(parser, cctx,
&(pkt->data), &(pkt->size),
tempPacket->data, tempPacket->size,
tempPacket->pts, tempPacket->dts, tempPacket->pos
);
pkt->pts = parser->pts;
pkt->dts = parser->dts;
pkt->pos = parser->pos;
//Set keyframe flag
if (parser->key_frame == 1 ||
(parser->key_frame == -1 &&
parser->pict_type == AV_PICTURE_TYPE_I))
pkt->flags |= AV_PKT_FLAG_KEY;
if (parser->key_frame == -1 && parser->pict_type == AV_PICTURE_TYPE_NONE && (pkt->flags & AV_PKT_FLAG_KEY))
pkt->flags |= AV_PKT_FLAG_KEY;
pkt->duration = 96000; //Same result as in av_read_frame()
//Decode:
avcodec_send_packet(cctx, pkt);
avcodec_receive_frame(cctx, frm);
//Display frame
//…
}
}
I checked the fields of the resulting packet (pkt) just before avcodec_send_packet() in both solutions. They are as far as I can tell identical. The only difference might be the actual content of pkt->data. My solution decodes I-Frames fine, but the references in P-Frames seem to be broken, causing heavy artifacts and error messages such as “invalid level prefix”, “error while decoding MB xx”, and similar.
I would be very grateful for any hints.
Edit 1: I have developed a workaround for the time being: in the video server, after sending the packet containing the encoded data of a frame, I send one dummy packet which only contains the delimiters marking beginning and end of the packet. This way, I push the actual video data frames through av_read_frame(). I discard the dummy packets immediately after av_frame_read().
Edit 2: Solved here by rom1v, as written in his comment to this question.
Upvotes: 6
Views: 6674