Reputation: 141
I have a problem with encoding in acc with FFmpeg.
I have au mp4 file with aac audio. I tried to copy the audio with ffmpeg.
In the source mp4 file, the first audio noise appears at 0.30 seconds.
After conversion using ffmpeg -i inputfile.mp4 -c:a copy outputfile.aac
, the resulting file is wrong, the first audio noise appears at 0.32 seconds.
The duration of the file is not the same too.
When i force the encoder to libfaac, it works but the file is too big.
So why it doesn't work when the default encoder is used (aac, libfdk_aac) ? Note that the same thing appears when i convert from audacity.
Thanks a lot
Upvotes: 8
Views: 4350
Reputation: 31
The accepted answer is correct that this is from AAC padding being added for each segment. Muxing to m4a may appear to remove the padding if you do it for a single file, but ffmpeg will, by default, add the padding back in when you concatenate the m4a files using the concat demuxer. (Or it did when I tried at least.)
To concat seamlessly with no gaps, you can do the following:
The 2 extra beginning frames are required because each AAC frame is dependent on up to 2 frames before it. So to encode the first frame correctly, it needs that context. The 2 extra frames at the end are required because ffmpeg tapers the audio at the end to avoid a sudden pop. By adding 2 extra frames, we shift the taper so it doesn't affect our actual content. These extra frames are then removed with inpoint and outpoint to avoid repeated content.
Using this method, it's important that all segments have a length that's an exact multiple of an AAC frame duration. If they don't, you will see unpredictable artifacts at segment boundaries.
I recently released a repo that demonstrates this further with actual code: https://github.com/wistia/seamless-aac-split-and-stitch-demo.
Upvotes: 3
Reputation: 93329
There is a padding frame in the audio stream which is needed by the decoder in order to decode the first frame. This is technical requirement of MDCT audio codecs like AAC. In a timed sample container like MP4/MKV, that first frame has a negative presentation timestamp. In a raw AAC bitstream, that first frame is naively decoded. Each frame has 1024 samples and so has a duration of 21-23 ms. Your difference in timing is due to that offset. Rewrap to a container like M4A to avoid this.
For background, from Apple:
AAC requires data beyond the source PCM audio samples in order to correctly encode and decode audio samples due to the nature of the encoding algorithm. AAC encoding uses a transform over consecutive sets of 2048 audio samples, applied every 1024 audio samples (overlapped). For correct audio to be decoded, both transforms for any period of 1024 audio samples are needed. For this reason, encoders add at least 1024 samples of silence before the first ‘true’ audio sample, and often add more. This is called variously “priming”, “priming samples”, or “encoder delay”.
and
The lack of explicit representation for encoder delay and remainder samples is not a problem unique to AAC encoding. With MPEG-4 and ADTS/MPEG-2 bitstreams and file containers, there is still no satisfactory, explicit representation for either the encoder delay or remainder samples. MP3 also has these data dependencies and delays in its bitstream, as do proprietary codecs such as AC-3 and others.
Upvotes: 15