Remuxing mp4 on the fly with FFmpeg API

Question

My goal is to stream out mpegts. As input, I take an mp4 file stream. That is, video producer writes mp4 file into a stream that I try to work with. The video may be anywhere from one minute to ten minutes. Because producer writes bytes into stream, the originally written mp4 header is not complete (first 32 bytes prior to ftyp are 0x00 because it doesn't know yet various offsets... which are written post-recording, I think):

This is how the header of typical mp4 looks like:

00 00 00 18   66 74 79 70   69 73 6f 6d   00 00 00 00   ....ftypisom.... 
69 73 6f 6d   33 67 70 34   00 01 bb 8c   6d 64 61 74   isom3gp4..»Œmdat

This is how the header of "in progress" mp4 looks like:

00 00 00 00   66 74 79 70   69 73 6f 6d   00 00 00 00   ....ftypisom.... 
69 73 6f 6d   33 67 70 34   00 00 00 18   3f 3f 3f 3f   isom3gp4....????
6d 64 61 74                                             mdat

It is my guess but I assume that once the producer completes recording, it updates the header by writing all the necessary offsets.

I have run into two issues while trying to make this work:

I created a custom AVIO with read function that does not support seeking. In my driver program, I decided to stream in a properly formatted mp4 file. I am able to detect its input format. When I try to open it, I see that my custom read function gets executed within avformat_open_input until the entire file is read in.

My code sample:

av_register_all();

AVFormatContext* pCtx = avformat_alloc_context();
pCtx->pb = avio_alloc_context(
    pBuffer,         // internal buffer
    iBufSize,        // internal buffer size
    0,               // bWriteable (1=true,0=false)
    stream,          // user data ; will be passed to our callback functions
    read_stream,     // read callback function
    NULL,            // write callback function (not used in this example)
    NULL             // seek callback function
);
pCtx->pb->seekable = 0;
pCtx->pb->write_flag = 0;
pCtx->iformat = av_find_input_format( "mp4" );
pCtx->flags |= AVFMT_FLAG_CUSTOM_IO;

avformat_open_input( &pCtx, "", pCtx->iformat, NULL );

Obviously, this does not work as I need (I expectations were wrong). Once I substitute the file of finite size by a stream of varies length, I cannot have avformat_open_input wait around for the stream to finish before attempting to do further processing.

As such, I need to find a way to open input without attempt to read it and only read when I execute av_read_frame. Is this at all possible to do by using custom AVIO. That is, prepare/open input -> read initial input data into input buffer -> read frame/packet from input buffer -> write packet to output -> repeat read input data until the end of stream.

I did scavenge google and only saw two alternatives: providing custom URLProtocol and using AVFMT_NOFILE.

Custom URLProtocol
This sounds like a little backwards way for what I'm trying to accomplish. I understand that it is best used when there is a file source available. Whereas I am trying to read from a byte stream. Also, another reason I think it doesn't fit my needs is that custom URLProtocol needs to be compiled into ffmpeg lib, correct? Or is there a way to manually register it during runtime?

AVFMT NOFILE
This seems like something that should actually work best for me. The flag itself says that there is no underlying source file and assumes that I will handle all the reading and provisioning of input data. The trouble is that I haven't seen any online code snippets so far but my assumption is as follows:

I am really hoping to get some suggestions of food for brain from anyone because I am a newbie to ffmpeg and digital media and my second issue expects that I can stream output while ingesting input.

As I mentioned above, I have a handle on the mp4 file bytestream as it would be written to the hard disk. The format is mp4 (h.264 and aac). I need to remux it to mpegts prior to streaming it out. This shouldn't be difficult because mp4 and mpegts are simply containers. From what I learned so far, mp4 file looks the following:

[header info containing format versions]
mdat
[stream data, in my case h.264 and aac streams]
[some trailer separator]
[trailer data]

If that is correct, I should be able to get the handle on h.264 and aac interleaved data by simply starting to read the stream after "mdat" identifier, correct?

If that is true and I decide to go with AVFMT_NOFILE approach of managing input data, I can just ingest stream data (into AVFormatContext buffer) -> av_read_frame -> process it -> populate AVFormatContext with more data -> av_read_frame -> and so on until the end of stream.

I know, this is a mouthful and a dump of my thoughts but I would appreciate any discussion, pointers, thoughts!

Vadym · Accepted Answer

OK, another question researched and answered by self...

Turns out, as I theorized in the question, mp4 file is not fully written until the end. During a direct disk write to a file, the producer would seek back to the start of the video and update all the pointers to various atoms. That is, the general structure of mp4 is ftyp -> mdat -> moov. Where moov contains all the meta about the contained tracks. Unfortunately, it is written last. However, its location is located in the header. That is why the seek is required: mdat is of varied length (because it contains raw encoded frames, there can be x number of them). Thus, the moov atom is offset by the length of mdat. When producer finishes writing the file, it will update the header with the proper location of moov.

For additional references: Android broadcasting without disk writes

If this approach is taken, the finalized file must be "fixed".

There is a helpful suggestion on fixing the file in the comments section of the specified link:

Just to help those having issues, the SDK seems to try to seek to insert the size values of the mdat atom, and also the moov header. I set the encoder in this example to produce a THREE_GPP file. In order to play the output THREE_GPP, you're going to need to first create the header in the first 28 bytes prior to the mdat atom (which should all be zeros). 00 00 00 18 66 74 79 70 33 67 70 34 00 00 03 00 33 67 70 34 33 67 70 36 00 02 F1 4D 6D The 6D is the 'm' first byte in the mdat atom. The four bytes proceeding that need to be modified to include the integer value of the byte in your stream containing the output moov atom (Which should be output upon stopping the recording). As long as this header is correctly set, and the player can locate the moov atom- everything should play back correctly. Also, the socket method here isn't very flexible- you can perform finer alterations of the packet data to a network (I'm attempting this at the moment for live streaming), by providing it with a local socket, and then connecting to that local socket and processing its output independently (In a thread for instance) for transmission over UDP, RTP, etc..

-- Jason

However, it should become obvious that this does not help at all in streaming of live playable video.

I am now faced with only possibility of trying to get rtp (via SipDroid or SpyCamera methods) and converting via ffmpeg on NDK side.

Remuxing mp4 on the fly with FFmpeg API

Answers (2)

Related Questions