Bojan
Bojan

Reputation: 69

Create video with correct FPS from video with incorrect FPS and file containing timestamps for each frame using FFMPEG

I have a video file captured from web camera and using OpenCV in python. The nominal FPS of the web cam is 30 FPS, but because of the environment, the actual FPS varies and sometimes can go as low as 24 fps. The recorded video is created using OpenCV's VideoWriter, using MP4V FOURCC and always has FPS value of 30, which makes the video's duration incorrect if the actual FPS was not 30. I have a file which contains the timestamps when each frame from the web cam was read (generated using python's time.time()).

Question:
Using FFMPEG (or other software) can I use the timestamps information to create new video file (probably VFR file) and then convert it to CFR file?

I am not sure what is the correct approach to create video file with correct time base. Maybe I can split the video's frames and save them as images and then use the timestamps and the images to create VFR video, but I want to see if it is possible to do this in another, more elegant way.

Thanks in advance!

Upvotes: 2

Views: 908

Answers (1)

Rotem
Rotem

Reputation: 32094

We may modify the timestamps using PyAV using a process called "re-muxing".
"re-muxing" is the term used for replacing the container of the stream without re-encoding the stream.

Assume for example that the video file type is MP4, and the file include a single video stream encoded with H.264 video codec.
We may replace or modify the MP4 container without decoding and encoding the video.
Since the timestamps are part of the container, we may modify the timestamps using re-muxing process (the main advantage compared to re-encoding is that the video quality if perfectly preserved).

The re-muxing solution is based on a code sample from PyAV documentation


For testing, we may start by creating a short synthetic MP4 video file using FFmpeg CLI (10 frames at 1fps):

ffmpeg -y -f lavfi -i testsrc=size=192x108:rate=1:duration=10 -vcodec libx264 -crf 10 -pix_fmt yuv444p input.mp4

We may view the timestamps using FFprobe:

ffprobe -show_packets input.mp4

Output:

pts=0
pts_time=0.000000
dts=-32768
dts_time=-2.000000
duration=16384
duration_time=1.000000
...
pts=65536
pts_time=4.000000
dts=-16384
dts_time=-1.000000
duration=16384
duration_time=1.000000
...
pts=32768
pts_time=2.000000
dts=0
dts_time=0.000000
duration=16384
duration_time=1.000000
...
pts=16384
pts_time=1.000000
dts=16384
dts_time=1.000000
duration=16384
duration_time=1.000000
...
pts_time=3.000000

As you can see, the DTS timestamps are increased sequentially, but start from -2.
The PTS timestamps counts 0, 4, 2, 1, 3... (the reason for the non-monotonous counting is the usage of B-Frames)

We can also see that the original timestamps are in units of 16384 ticks per second (the 1/16384 is the time-base used by the MP4 container).

The duration of all the frames is 1 second (16384 ticks).


For testing, we are going to use a list of 10 timestamps (given in seconds):

new_pts_list = [0.0, 1.0, 5.0, 7.0, 8.0, 18.0, 19.0, 20.0, 21.0, 22.0]

The large gaps are used for testing.


Computing the index of the timestamps in the list:

Since the PTS timestamps are not sequential (and DTS doesn't start for 0), we have to compute the index of the frame that the timestamp is applied to.
When we have the index, we may get the new timestamps from the list.

Computing the index by the PTS of the packet:

index_of_old_pts = int(np.round(float(packet.pts) / float(packet.duration)))

Computing the index by the DTS of the packet:

index_of_old_dts = int(np.round(float(packet.dts) / float(packet.duration)))

Getting the updated timestamps from the list:

new_pts = new_pts_list[index_of_old_pts]
new_dts = new_pts_list[index_of_old_dts]

Note: The index computation is working only with constant frame-rate input video.


After getting the updated timestamps, we have to convert them from seconds to time-base ticks unit, and modify the timestamp of the packet:

new_pts_in_timebase_units = int(np.round(new_pts / packet.time_base))
new_dts_in_timebase_units = int(np.round(new_dts / packet.time_base))
packet.pts = new_pts_in_timebase_units
packet.dts = new_dts_in_timebase_units

For improving the accuracy of the timestamps, I decided not to rely on the time-base of the input, and set the time-base of the output to 1/1000000 seconds.

out_video_stream.time_base = Fraction(1, 1000000)

Code sample:

import av
import numpy as np
from fractions import Fraction

# Build 1 fps input file using FFmpeg CLI (for testing):
# ffmpeg -y -f lavfi -i testsrc=size=192x108:rate=1:duration=10 -vcodec libx264 -crf 10 -pix_fmt yuv444p input.mp4

input_video_file = 'input.mp4'
output_video_file = 'output.mp4'

# List of new PTS in unit of seconds (set long gaps for testing)
new_pts_list = [0.0, 1.0, 5.0, 7.0, 8.0, 18.0, 19.0, 20.0, 21.0, 22.0]

average_out_frame_period = np.mean(np.diff(np.array(new_pts_list)))  # Compute the average frame period (used for negative DTS).

# https://pyav.org/docs/develop/cookbook/basics.html#remuxing

with av.open(input_video_file, 'r') as inp:
    inp_video_stream = inp.streams.video[0]  # Get video stream - assume the input has only one stream, and that stream is a video stream.
    with av.open(output_video_file, 'w', format="mp4") as out:  # Open output file, set format to mp4
        out_video_stream = out.add_stream(template=inp_video_stream)  # Add the input stream to the output container.
        out_video_stream.time_base = Fraction(1, 1000000)  # Set the timebase to 1/1000000 for improving accuracy.

        for i, packet in enumerate(inp.demux(inp_video_stream)):  # Demux the input video stream
            if packet.dts is None:
                continue  # When DTS = None,  marks that the packet should be ignored.

            packet.stream = out_video_stream
            old_timebase = packet.time_base
            packet.time_base = out_video_stream.time_base  # Set the timebase to 1/1000000 for improving accuracy.

            index_of_old_pts = int(np.round(float(packet.pts) / float(packet.duration)))  # Convert the PTS from time-base ticks to frame index of that PTS.

            if index_of_old_pts < 0 or index_of_old_pts >= len(new_pts_list):
                new_pts = index_of_old_pts*average_out_frame_period  # Negative PTS (not supposed to exist) - use average frame period for setting the new PTS.
            else:
                new_pts = new_pts_list[index_of_old_pts]  # Get the value by the index of PTS in new_pts_list.

            new_pts_in_timebase_units = int(np.round(new_pts / packet.time_base))  # Convert from second to units of time base
            packet.pts = new_pts_in_timebase_units  # Update the PTS of the packet

            index_of_old_dts = int(np.round(float(packet.dts) / float(packet.duration)))  # Convert the DTS from time-base ticks to frame index of that DTS.
            if index_of_old_dts < 0 or index_of_old_dts >= len(new_pts_list):
                new_dts = index_of_old_dts*average_out_frame_period  # Negative DTS - use average frame period for setting the new DTS.
            else:
                new_dts = new_pts_list[index_of_old_dts]  # Get the value by the index of PTS in new_pts_list.

            new_dts_in_timebase_units = int(np.round(new_dts / packet.time_base))  # Convert from second to units of time base
            packet.dts = new_dts_in_timebase_units  # Update the DTS of the packet

            #packet.duration = new_packet_duration_in_timebase_units  #attribute 'duration' of 'av.packet.Packet' objects is not writable

            out.mux(packet)

Testing the output using FFprobe:

ffprobe -show_packets output.mp4

Output:

pts=0
pts_time=0.000000
dts=-4666667
dts_time=-4.666667
duration=1000000
duration_time=1.000000
...
pts=7000000
pts_time=7.000000
dts=-2333333
dts_time=-2.333333
duration=1000000
duration_time=1.000000
...
pts=5000000
pts_time=5.000000
dts=0
dts_time=0.000000
duration=1000000
duration_time=1.000000
...
pts=1000000
pts_time=1.000000
dts=1000000
dts_time=1.000000
duration=1000000
duration_time=1.000000
...
pts=6000000
pts_time=6.000000
dts=5000000
dts_time=5.000000
duration=1000000
duration_time=1.000000
...
pts=17000000
pts_time=17.000000
dts=6000000
dts_time=6.000000
duration=1000000
duration_time=1.000000
...

As you can see, the timestamps matches the given list.

Note that we can't modify the duration of the packets, because the duration is stored in the H.264 video stream, and not in the MP4 container.
modifying the frame duration without re-encoding is not supported by PyAV (but it shouldn't be a problem when playing the video using a modern video player).


Sample output as animated GIF:

enter image description here

Upvotes: 5

Related Questions