Reputation: 116010

How come muxing to an mp4 file skips one of the frames I provide to it?

Background

In the past, I've created and even shared an example of how to create MP4 file from a series of Bitmaps, here, based on here, and I've also published the code on Github, here.

It seems to work fine with a single image, as such:

@WorkerThread
private fun testImage() {
    Log.d("AppLog", "testImage")
    val startTime = System.currentTimeMillis()
    Log.d("AppLog", "start")
    val videoFile = File(ContextCompat.getExternalFilesDirs(this, null)[0], "image.mp4")
    if (videoFile.exists())
        videoFile.delete()
    videoFile.parentFile!!.mkdirs()
    val timeLapseEncoder = TimeLapseEncoder()
    val bitmap = BitmapFactory.decodeResource(resources, R.drawable.test)
    val width = bitmap.width
    val height = bitmap.height
    timeLapseEncoder.prepareForEncoding(videoFile.absolutePath, width, height)
    val frameDurationInMs = 1000
    timeLapseEncoder.encodeFrame(bitmap, frameDurationInMs)
    timeLapseEncoder.finishEncoding()
    val endTime = System.currentTimeMillis()
    Log.d("AppLog", "it took ${endTime - startTime} ms to convert a single image ($width x $height) to mp4")
}

The problem

When I try to work on multiple frames, and even just 2 frames, I can see that sometimes it skips some frames, making the video also shorter.

For example, this scenario that it should take 2 frames, each takes 5 seconds, yet the output gets to be 5 seconds instead of 10 seconds, and it ignores the entire second frame:

@WorkerThread
private fun testImages() {
    Log.d("AppLog", "testImages")
    val startTime = System.currentTimeMillis()
    Log.d("AppLog", "start")
    val videoFile = File(ContextCompat.getExternalFilesDirs(this, null)[0], "images.mp4")
    if (videoFile.exists())
        videoFile.delete()
    videoFile.parentFile!!.mkdirs()
//        Log.d("AppLog", "success creating parent?${videoFile.parentFile.exists()}")
    val timeLapseEncoder = TimeLapseEncoder()
    val bitmap = BitmapFactory.decodeResource(resources, R.drawable.frame1)
    val width = bitmap.width
    val height = bitmap.height
    timeLapseEncoder.prepareForEncoding(videoFile.absolutePath, width, height)
    val delay = 5000
    timeLapseEncoder.encodeFrame(bitmap, delay)
    val   bitmap2 = BitmapFactory.decodeResource(resources, R.drawable.frame2)
    timeLapseEncoder.encodeFrame(bitmap2, delay)
    timeLapseEncoder.finishEncoding()
    val endTime = System.currentTimeMillis()
    Log.d("AppLog", "it took ${endTime - startTime} ms to convert a single image ($width x $height) to ${videoFile.absolutePath} ${videoFile.exists()} ${videoFile.length()}")
}

What I've tried

I tried to go over the code and also debug, but it seems fine...

Weird thing is that if I change the duration and also add more frames, it seems to be fine, such as:

This will produce 12 seconds video, when first 6 seconds is of one image, and the rest 6 seconds are of another image.

I also tried to have the equivalence to what I originally did, just in more frames:

for (i in 0 until 500)
    timeLapseEncoder.encodeFrame(bitmap, 10)
val bitmap2 = BitmapFactory.decodeResource(resources, R.drawable.frame2)
for (i in 0 until 500)
    timeLapseEncoder.encodeFrame(bitmap2, 10)

This didn't create 5 seconds for each image at all...

I thought that maybe it's some issue with fps, but it's set fine in the code already, to 30, which is reasonable and it's probably above the minimal that's allowed for MP4 format.

So I tried to re-write the entire implementation. I thought it would help, but it has similar issues:

class BitmapToVideoEncoder(outputPath: String?, width: Int, height: Int, bitRate: Int, frameRate: Int) {
    private var encoder: MediaCodec?
    private val inputSurface: Surface
    private var mediaMuxer: MediaMuxer?
    private var videoTrackIndex = 0
    private var isMuxerStarted: Boolean
    private var presentationTimeUs: Long

    init {
        val format = MediaFormat.createVideoFormat(MediaFormat.MIMETYPE_VIDEO_AVC, width, height)
        format.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface)
        format.setInteger(MediaFormat.KEY_BIT_RATE, bitRate)
        format.setInteger(MediaFormat.KEY_FRAME_RATE, frameRate)
        format.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 1)
        encoder = MediaCodec.createEncoderByType(MediaFormat.MIMETYPE_VIDEO_AVC)
        encoder!!.configure(format, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)
        inputSurface = encoder!!.createInputSurface()
        encoder!!.start()
        mediaMuxer = MediaMuxer(outputPath!!, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)
        isMuxerStarted = false
        presentationTimeUs = 0
    }

    @Throws(IOException::class)
    fun encodeFrame(bitmap: Bitmap, durationInMs: Long) {
        val frameDurationUs = durationInMs * 1000
        drawBitmapToSurface(bitmap)
        drainEncoder(false)
        presentationTimeUs += frameDurationUs
    }

    @Throws(IOException::class)
    fun finishEncoding() {
        drainEncoder(true)
        release()
    }

    private fun drawBitmapToSurface(bitmap: Bitmap) {
        val canvas = inputSurface.lockCanvas(null)
        canvas.drawBitmap(bitmap, 0f, 0f, null)
        inputSurface.unlockCanvasAndPost(canvas)
    }

    @Throws(IOException::class)
    private fun drainEncoder(endOfStream: Boolean) {
        if (endOfStream) {
          //Sending end of stream signal to encoder
            encoder!!.signalEndOfInputStream()
        }

        val bufferInfo = MediaCodec.BufferInfo()
        while (true) {
            val encoderStatus = encoder!!.dequeueOutputBuffer(bufferInfo, 10000)
            @Suppress("DEPRECATION")
            when {
                encoderStatus == MediaCodec.INFO_TRY_AGAIN_LATER -> {
                    if (!endOfStream) {
                        break
                    }
                }
                encoderStatus == MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED -> {
                    //Output buffers changed
                }
                encoderStatus == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED -> {
                    if (isMuxerStarted) {
                        throw RuntimeException("format changed twice")
                    }
                    val newFormat = encoder!!.outputFormat
                    videoTrackIndex = mediaMuxer!!.addTrack(newFormat)
                    mediaMuxer!!.start()
                    isMuxerStarted = true
                }
                encoderStatus < 0 -> {
        //                Unexpected result from encoder
                }
                else -> {
                    val encodedData = encoder!!.getOutputBuffer(encoderStatus)
                        ?: throw RuntimeException("encoderOutputBuffer $encoderStatus was null")
                    if (bufferInfo.size != 0) {
                        if (!isMuxerStarted) {
                            throw RuntimeException("muxer hasn't started")
                        }
                        // Adjust the bufferInfo to have the correct presentation time
                        bufferInfo.presentationTimeUs = presentationTimeUs
                        encodedData.position(bufferInfo.offset)
                        encodedData.limit(bufferInfo.offset + bufferInfo.size)
                        mediaMuxer!!.writeSampleData(videoTrackIndex, encodedData, bufferInfo)
                    }
                    encoder!!.releaseOutputBuffer(encoderStatus, false)
                    if ((bufferInfo.flags and MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0) {
                        //End of stream reached
                        break
                    }
                }
            }
        }
    }

    private fun release() {
        if (encoder != null) {
            encoder!!.stop()
            encoder!!.release()
            encoder = null
        }
        if (mediaMuxer != null) {
            mediaMuxer!!.stop()
            mediaMuxer!!.release()
            mediaMuxer = null
        }
    }

}

What's handled here is when all input bitmaps are the same resolution as the CTOR's parameter and not transparent. Also, the input resolution should match what the device can handle to encode.

To handle this too, there are 2 approaches:

Switch to WEBM which supports transparency, and then always fit to center.
Have some background to be set and always fit to center.

As for the resolution that you can handle, I need to check what's supported by something like this:

  MediaCodec codec = MediaCodec.createEncoderByType(mimeType);
        MediaCodecInfo codecInfo = codec.getCodecInfo();
        MediaCodecInfo.CodecCapabilities capabilities = codecInfo.getCapabilitiesForType(mimeType);
        MediaCodecInfo.VideoCapabilities videoCapabilities = capabilities.getVideoCapabilities();
        codec.release();

I didn't add this here because it becomes more complicated. I might add it to the repository, or prepare to have it.

This still has issues. Sometimes reaching 0 duration, or even when it's not, playing it might be like 0 duration.

The questions

How do I create the video properly, using this same API of adding one frame after another, with potentially a different duration and even different resolution or transparency?
Is there perhaps a better way to create video files from images, where you set the duration of each frame, one after another? A solution that doesn't require a large library and doesn't have a problematic license? I know of FFmpeg, but it's both large and has a license that seems not so permissive...

Upvotes: 2

Answers (2)

Shaheed Haque

Reputation: 723

For your question 1, is it possible that the delays you are assembling do not EXACTLY match the frame rate you are setting using format.setInteger(MediaFormat.KEY_FRAME_RATE, frameRate)? Video encoders are usually extremely picky about this.

I suggest making sure each original image is displayed for an exact number of frame periods.

Upvotes: 0

sdex

Reputation: 3871

Android Media3 Transformer is a perfect choice. Creating a video file from images is very easy:

Add dependencies:

    implementation("androidx.media3:media3-transformer:1.3.1")
    implementation("androidx.media3:media3-common:1.3.1")

Create an instance of the transformer:

    val transformer = Transformer.Builder(/*context*/ this).build()

Then, create a new composition from the media items sequence (I used images from the asset folder just for demonstration):

    val imageFrameCount = 31
    val editedMediaItemList = ImmutableList.of(
        createImageEditedMediaItem("asset:///1.png", imageFrameCount),
        createImageEditedMediaItem("asset:///2.png", imageFrameCount)
    )
    val composition = Composition.Builder(
        ImmutableList.of(EditedMediaItemSequence(editedMediaItemList))
    )
        .build()

The function to create EditedMediaItem from an image:

@OptIn(UnstableApi::class)
private fun createImageEditedMediaItem(
    uri: String,
    frameCount: Int,
    durationSec: Int = 5
): EditedMediaItem {
    return EditedMediaItem.Builder(MediaItem.fromUri(uri))
        .setDurationUs(C.MICROS_PER_SECOND * durationSec)
        .setFrameRate(frameCount)
        .build()
}

Add a listener (it's optional, but we need to know when the transformation is completed) and start the transformer:

    transformer.addListener(object : Transformer.Listener {
                override fun onCompleted(composition: Composition, exportResult: ExportResult) {
                    super.onCompleted(composition, exportResult)
                    Timber.d("onCompleted: $exportResult")
                }
            })
    
    transformer.start(composition, File(cacheDir, "output.mp4").absolutePath)

Upvotes: 1