Rub
Rub

Reputation: 2748

Concatenating audio files with ffmpeg results in a wrong total duration

With "wrong total duration" I mean a total duration different from the sum of individual duration of audio files.

sum_duration_files != duration( concatenation of files )

In particular I am concatenating 2 OGG audio files with this command

ffmpeg -safe 0 -loglevel quiet \
  -f concat -segment_time_metadata 1 -i {m3u_file_name} \
  -vf select=concatdec_select \
  -af aselect=concatdec_select,aresample=async=1 \
  {ogg_file_name}

And I get the following

# Output of:  ffprobe <FILE>.ogg


======== files_in 

Input #0, ogg, from 'f1.ogg':
  Duration: 00:00:04.32, start: 0.000000, bitrate: 28 kb/s
    Stream #0:0: Audio: opus, 48000 Hz, mono, fltp


Input #0, ogg, from 'f2.ogg':
  Duration: 00:00:00.70, start: 0.000000, bitrate: 68 kb/s
    Stream #0:0: Audio: vorbis, 44100 Hz, mono, fltp, 160 kb/s
    Metadata:
      ENCODER         : Lavc57.107.100 libvorbis

Note durations: 4.32 and 0.7 sec

And this is the output file.

========== files out (concatenate of files_in)

Input #0, ogg, from 'f_concat_v1.ogg':
  Duration: 00:00:04.61, start: 0.000000, bitrate: 61 kb/s
    Stream #0:0: Audio: vorbis, 48000 Hz, mono, fltp, 80 kb/s
    Metadata:
      ENCODER         : Lavc57.107.100 libvorbis

Duration: 4.61 sec

As 4.61 sec != 4.32 + 0.7 sec I have a problem.

Upvotes: 0

Views: 2417

Answers (2)

Rub
Rub

Reputation: 2748

I don't know WHY it happens, but I know how to avoid the problem in my particular case.

My case: I am mixing (concatenating) different audio files generated by one single source with silence files generated by me.

Initially I generated the silence files with

# x is a float from python

ffmpeg -f lavfi -i anullsrc=r=44100:cl=mono -t {x:2.1f} -q:a 9 -acodec libvorbis silence-{x:2.1f}.ogg

Trying to resolve the issue I re-created those silences with the SAME parameters than the audios I was mixing with, that is (mono at 48Khz):

ffmpeg -f lavfi -i anullsrc=r=48000:cl=mono -t {x:2.1f} -c:a libvorbis silence-{x:2.1f}.ogg

And now ffprobe shows the expected result.

========== files out (concatenate of files_in)

Input #0, ogg, from 'f_concat_v2.ogg':
  Duration: 00:00:05.02, start: 0.000000, bitrate: 56 kb/s
    Stream #0:0: Audio: vorbis, 48000 Hz, mono, fltp, 80 kb/s
    Metadata:
      ENCODER         : Lavc57.107.100 libvorbis

Duration: 5.02 = 4.32 + 0.70

If you want to avoid problems when concatenating silence with other sounds, do create the silence with the SAME parameters than the sound you will mix with (mono/stereo and Hz)

==== Update 2022-03-08

Using the info provided by @kesh I have recreated the silent ogg files using

ffmpeg -f lavfi -i anullsrc=r=48000:cl=mono -t 5.8 -c:a libopus silence-5.8.ogg

And now the

ffmpeg -safe 0 -f concat -segment_time_metadata 1 
-i {m3u_file_name} 
-vf select=concatdec_select 
-af aselect=concatdec_select,aresample=async=1 {ogg_file_name}

Doesn't throw this error anymore (multiple times).

[opus @ 0x558b2c245400] Error parsing the packet header.
Error while decoding stream #0:0: Invalid data found when processing input

I must say that the error was not creating (for me) any problem, because the output was what I expected, but now I feel better without it.

Upvotes: 0

kesh
kesh

Reputation: 5533

The issue here is using a wrong concatenation approach for these files. As FFmpeg wiki article suggests, file-level concatenation (-f concat) requires all files in the listing to have the exact same codec parameters. In your case, only # of channels (mono) and sample format (flt) are common between them. On the other hand, codec (opus vs. vorbis) and sampling rate (48000 vs. 44100) are different.

-f concat grabs the first set of parameters and runs with it. In your case, it uses 48000 S/s for all the files. Although the second file is 44100 S/s, it assumes 48k (so it'll play it faster than it is). I don't know how the difference in the codec played out in the output.

So, a standard approach is to use -filter_complex concat=a=1:v=1:n=2 with these files given as separate inputs.

Out of curiosity, have you listen to the wrong-duration output file? [edit: never mind, your self-answer indicates one of them is a silent track]

Upvotes: 1

Related Questions