Reputation: 1401
I need to concatenate mp4 files from different sources, this means some of the variables are out of my control such as timebase, aspect ratio and encoding. So to get around this I re-encode and attempt to standardise the files before concatenating them. Unfortunately, despite this I get Non-monotonous DTS in output stream
warnings during the concatenation stage, and the output video seems to always have broken audio/video syncing by the last segment.
I know there are a lot of other questions out there about resolving the warning above, but I've been through them all and reviewed the documentation.. but unfortunately I've been still been unable to solve it..
I think the thing which I don't understand is: if I have mp4s from different sources, what exactly do I need to do to ensure that the files will always neatly concatenate together?
What I've tried so far
The script I'm using to standardise the mp4 files before concantenation is the following (amends resolution, frame rate, timebase, bitrate for audio, bitrate for video, audio encoding and video encoding):
ffmpeg -y -i $1 -vf 'scale=1280:720:force_original_aspect_ratio=1,pad=1280:720:(ow-iw)/2:(oh-ih)/2' -r 30 -video_track_timescale 90000 -b:a 128K -b:v 1200K -c:a aac -c:v libx264 $2
Here's the ffprobe
output on two of the files, there are some differences but I'm not sure if they are significant?
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'intro.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.12.100
Duration: 00:00:08.98, start: 0.000000, bitrate: 1210 kb/s
Stream #0:0(eng): Video: h264 (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1069 kb/s, 30 fps, 30 tbr, 90k tbn, 60 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 132 kb/s (default)
Metadata:
handler_name : SoundHandler
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'middle.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.12.100
Duration: 00:00:59.72, start: 0.000000, bitrate: 1200 kb/s
Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1063 kb/s, 30 fps, 30 tbr, 90k tbn, 60 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : SoundHandler
They all have normal video and audio at this point.
After that I concatenate them and add a watermark using the following (it sucks that I need to re-encode here):
ffmpeg -y \
-f concat \
-safe 0 \
-i $INFILES \
-c:v libx264 \
-c:a copy \
-preset fast \
-vf drawtext=enable="'between(t, $DRAW_TEXT_DELAY, $DRAW_TEXT_DURATION)': fontfile=$FONT_DIR/$FONT: text='$TEXT': fontcolor=$FONTCOLOR: fontsize=$FONTSIZE: $POSITION" \
$OUTFILE
INFILES
is a path to a text file formatted like:
file /usr/src/app/data/test/out/intro.mp4
file /usr/src/app/data/test/out/middle.mp4
file /usr/src/app/data/test/out/outro.mp4
What am I missing here? Is there a way to debug this further?
Upvotes: 2
Views: 1336
Reputation: 3074
There are three methods to concatenate files in FFmpeg.
Demuxer (You are using this)
This method can be used to concat files with the same paramters, like codecs, size, PAR, etc.
$ ffmpeg -concat -i files.txt [...] output.mp4
Protocol
Same as the first one, but on top of that, this method is useful for files that can be copied together bitwise - it doesn't involves re-encoding (some formats support this, like MpegTS or some lossless formats).
$ ffmpeg -i "concat:FILE_0| ... |FILE_N" [...] output.mp4
Filter
If you have videos with different codecs, you have to use this method:
$ ffmpeg -i <FILE_0> ... -i <FILE_N> [...] -filter_complex "[0:0][0:1]...[<N>:0][<N>:1] concat=n=<N>:v=1:a=1[v_out][a_out]" -map [v_out] -map [a_out] output.mp4
The concat
filter decodes the video and reencodes it with the same parameters. It also takes care of the audio streams. I'm not entirely sure what does it do if the resolutions are different, but this should be a good start.
Upvotes: 0
Reputation: 93299
Your audio streams have distinct sampling rates, and may have distinct channel count as well. Also, compressed MPEG audio streams will introduce slight async upon concat.
Use
ffmpeg -y -i $1 -vf 'scale=1280:720:force_original_aspect_ratio=1,pad=1280:720:(ow-iw)/2:(oh-ih)/2,setsar=1,format=yuv420p' -r 30 -c:v libx264 -b:v 1200K -ac 2 -ar 48000 -c:a pcm_s16le -video_track_timescale 90000 $2
to standardize, but save to MOV.
Then during concat, change -c:a copy
to -c:a aac
.
Upvotes: 4