DSalenga
DSalenga

Reputation: 71

FFmpeg: concatenate video files (containing audio) without filter_complex

I have a problem when trying to concatenate multiple files in FFmpeg; my goal is to create a video presentation by concatenating different types of slides:

(a) Image slides, which are converted into videos by looping the frame for a while. These type of slides do not have audio, so I add a silent audio track to them:

ffmpeg -f lavfi -i anullsrc=channel_layout=stereo:sample_rate=44100 -loop 1 -i inputFile.png -c:v libx264 -shortest -c:a aac -pix_fmt yuv420p -movflags faststart -profile:v high -r 30 -t 3 -level 4.0 -preset veryfast -crf 23 outputA.mp4

(b) Video slides, which have an overlaid watermark and last until the video is over. If the file does not contain audio, this is added in the same way as in the previous case:

-y -i inputFile.mp4 -i Watermark.jpg -filter_complex "[0]scale=1280:720,setsar=sar=1/1[0b]; [1]scale=1280:720[1b]; [0b][1b]overlay=0:0[ov]"  -c:v libx264 -shortest -c:a aac -pix_fmt yuv420p -movflags faststart -profile:v high -r 30 -t 2.8400055 -level 4.0 -preset veryfast -crf 23 outputB.mp4

-y -i inputFile.mp4 -i Watermark.jpg -f lavfi -i anullsrc=channel_layout=stereo:sample_rate=44100 -filter_complex "[0]scale=1280:720,setsar=sar=1/1[0b]; [1]scale=1280:720[1b]; [0b][1b]overlay=0:0[ov]"  -c:v libx264 -shortest -c:a aac -pix_fmt yuv420p -movflags faststart -profile:v high -r 30 -t 2.8400055 -level 4.0 -preset veryfast -crf 23 outputC.mp4

So, once I have all the generated files and a .txt file with all filenames, I want to concatenate using the simple command:

-y -f concat -safe 0 -i textfile.txt -c copy  outputConcat.mp4

Unfortunately, the result I obtain is far from perfect, as the audio screw everything up; I know that audio is the problem because calling the same instruction without taking audio into account (that is, with -c:v copy -an instead of -c copy) works fine.

One solution I've been testing is to use the concat filter inside filter_complex (transcoding both audio and video again), but I am concerned about speed, and this process is slow enough to be discarded.

-y  -i Slide1.mp4 -i Slide2.mp4 -i Slide3.mp4 -filter_complex " [0:v:0][0:a:0] [1:v:0][1:a:0] [2:v:0][2:a:0] concat=n=3:v=1:a=1[v][a]" -map "[v]" -map "[a]" -c:v libx264 -c:a aac -pix_fmt yuv420p -movflags faststart -profile:v high -r 30 -level 4.0 -preset veryfast -crf 23 Report.mp4

Another idea I had was to: (1) concatenate only audio tracks inside the filter_complex (much faster), (2) concatenate only video without using filter_complex (using a .txt file and -c:v copy -an), (3) add the audio obtained in (1) in the result obtained in (2). However, the duration of the resulting audio obtained in (1) is shorter than duration of the video obtained in (2). Knowing that all audio tracks are encoded with aac and have the same sampling frequency, the only parameter that changes from one to another is the number of kb/s.

Can you please help me finding out a way to concatenate these video slides without having to use filter_complex?

Thank you very much!

Upvotes: 1

Views: 1045

Answers (1)

Gyan
Gyan

Reputation: 93329

Maintaining A/V sync with concat when using MDCT-based audio codec like AAC is always a chore.

I'd suggest performing two concat operations within the same command.

ffmpeg -f concat -i video.txt -f concat -i audio.txt -map 0:v -c:v copy -map 1:a -c:a aac -shortest out.mp4

Where video.txt is your current list.

For audio, create an empty file like this:

ffmpeg -f lavfi -i anullsrc -t 100 empty.wav

Then create an audio.txt like this

file 'empty.wav'
duration 15.4
file 'videoslide1.mp4'
file 'empty.wav'
duration 3
file 'videoslide2.mp4'
file 'videoslide3.mp4'
file 'empty.wav'
duration 37
file 'videoslide4.mp4'
file 'empty.wav'

Only empty.wav and videos with existing audio should be listed. empty.wav will appear where an audio-less video is listed in video.txt. The duration for each empty.wav entry is the total duration of the audio-less videos between the entries for video slides with audio. This duration has to be less than the duration of empty.wav (100s in this case).

Advantage of this method is that there's no video transcoding, and it's easy to tweak the duration values to adjust sync issues.

Upvotes: 1

Related Questions