FFmpeg - Audio encoding produces extra noise over audio

Question

I am trying to use FFmpeg to take a video (MP4 in this case) and copy it as another MP4. This is so that I can get the hang of decoding/encoding a video and go on to doing other things in that process. My code basically takes a video file, decodes the video and audio streams, and encodes the video and audio streams to an output video file.

As of now, my code only works for the video stream of the input file. The video part of the output file is exactly the same as the video part of the input file. However, the audio part is not. The audio part of the output contains the original audio, but with noise over it. Think of it as someone screaming into their mic or when audio gets too loud for a speaker to handle.

The way I'm handling the decoding/encoding process for the video and audio streams is the same, except with a difference in AVCodecContext settings (video --> frame_rate, width, height, etc.; audio --> sample_rate, channels, etc.).

This is currently the code that I'm working with:

The Video struct:

typedef struct Video {
    AVFormatContext* inputContext;
    AVFormatContext* outputContext;
    AVCodec* videoCodec;
    AVCodec* audioCodec;
    AVStream* inputStream;
    AVStream* outputStream;
    AVCodecContext* videoCodecContext_I; // Input
    AVCodecContext* audioCodecContext_I; // Input
    AVCodecContext* videoCodecContext_O; // Output
    AVCodecContext* audioCodecContext_O; // Output
    int videoStream; // Video stream index
    int audioStream; // Audio stream index
} Video;

The main code that handles the encoding/decoding (I've only included the audio side since the video side is the same):

int openVideo(Video* video, char* filename, char* outputFile) {
    video->inputContext = avformat_alloc_context();
    if (!video->inputContext) {
        printf("[ERROR] Failed to allocate input format context
");
        return -1;
    }
    if (avformat_open_input(&(video->inputContext), filename, NULL, NULL) < 0) {
        printf("[ERROR] Could not open the input file
");
        return -1;
    }

    if (avformat_find_stream_info(video->inputContext, NULL) < 0) {
        printf("[ERROR] Failed to retrieve input stream info
");
        return -1;
    }
    avformat_alloc_output_context2(&(video->outputContext), NULL, NULL, outputFile);
    if (!video->outputContext) {
        printf("[ERROR] Failed to create output context
");
        return -1;
    }
    printf("[OPEN] Video %s opened
", filename);
    return 0;
}

int prepareStreamInfo(AVCodecContext** codecContext, AVCodec** codec, AVStream* stream) {
    *codec = avcodec_find_decoder(stream->codecpar->codec_id);
    if (!*codec) {
        printf("[ERROR] Failed to find input codec
");
        return -1;
    }
    *codecContext = avcodec_alloc_context3(*codec);
    if (!codecContext) {
        printf("[ERROR] Failed to allocate memory for input codec context
");
        return -1;
    }
    if (avcodec_parameters_to_context(*codecContext, stream->codecpar) < 0) {
        printf("[ERROR] Failed to fill input codec context
");
        return -1;
    }
    if (avcodec_open2(*codecContext, *codec, NULL) < 0) {
        printf("[ERROR] Failed to open input codec
");
        return -1;
    }
    return 0;
}

int findStreams(Video* video, char* filename, char* outputFile) {
    if (openVideo(video, filename, outputFile) < 0) {
        printf("[ERROR] Video %s failed to open
", filename);
        return -1;
    }
    for (int i = 0; i < video->inputContext->nb_streams; i++) {
        video->inputStream = video->inputContext->streams[i];
        if (video->inputContext->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
            video->videoStream = i;
            if (prepareStreamInfo(&(video->videoCodecContext_I), &(video->videoCodec), video->inputStream) < 0) {
                printf("[ERROR] Could not prepare video stream information
");
                return -1;video->outputStream->time_base = video->audioCodecContext_O->time_base;
            }
        } else if (video->inputContext->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
            video->audioStream = i;
            if (prepareStreamInfo(&(video->audioCodecContext_I), &(video->audioCodec), video->inputStream) < 0) {
                printf("[ERROR] Could not prepare audio stream information
");
                return -1;
            }
        }
        video->outputStream = avformat_new_stream(video->outputContext, NULL);
        if (!video->outputStream) {
            printf("[ERROR] Failed allocating output stream
");
            return -1;
        }
        if (avcodec_parameters_copy(video->outputStream->codecpar, video->inputStream->codecpar) < 0) {
            printf("[ERROR] Failed to copy codec parameters
");
            return -1;
        }
    }
    if (video->videoStream == -1) {
        printf("[ERROR] Video stream for %s not found
", filename);
        return -1;
    }
    if (video->audioStream == -1) {
        printf("[ERROR] Audio stream for %s not found
", filename);
        return -1;
    }
    if (!(video->outputContext->oformat->flags & AVFMT_NOFILE)) {
    if (avio_open(&(video->outputContext->pb), outputFile, AVIO_FLAG_WRITE) < 0) {
      printf("Could not open output file %s", outputFile);
      return -1;
    }
  }
    return 0;
}

int prepareAudioOutStream(Video* video) {
    video->audioCodec = avcodec_find_encoder_by_name("mp2");
    if (!video->audioCodec) {
        printf("[ERROR] Failed to find audio output codec
");
        return -1;
    }
    video->audioCodecContext_O = avcodec_alloc_context3(video->audioCodec);
    if (!video->audioCodecContext_O) {
        printf("[ERROR] Failed to allocate memory for audio output codec context
");
        return -1;
    }
    // Quite possibly the issue
    video->audioCodecContext_O->channels = video->audioCodecContext_I->channels;
    video->audioCodecContext_O->channel_layout = av_get_default_channel_layout(video->audioCodecContext_O->channels);
    video->audioCodecContext_O->sample_rate = video->audioCodecContext_I->sample_rate;
    video->audioCodecContext_O->sample_fmt = video->audioCodec->sample_fmts[0];
    video->audioCodecContext_O->bit_rate = video->audioCodecContext_I->bit_rate;
    video->audioCodecContext_O->time_base = video->audioCodecContext_I->time_base;
    video->audioCodecContext_O->strict_std_compliance = FF_COMPLIANCE_EXPERIMENTAL;
    if (avcodec_open2(video->audioCodecContext_O, video->audioCodec, NULL) < 0) {
        printf("[ERROR] Failed to open audio output codec
");
        return -1;
    }
    if (avcodec_parameters_from_context(getAudioStream(video)->codecpar, video->audioCodecContext_O) < 0) {
        printf("[ERROR] Failed to fill audio stream
");
        return -1;
    }
    return 0;
}

int decodeAudio(Video* video, AVPacket* packet, AVFrame* frame) {
    int response = avcodec_send_packet(video->audioCodecContext_I, packet);
    if (response < 0) {
        printf("[ERROR] Failed to send audio packet to decoder
");
        return response;
    }
    while (response >= 0) {
        response = avcodec_receive_frame(video->audioCodecContext_I, frame);
        if (response == AVERROR(EAGAIN) || response == AVERROR_EOF) {
            break;
        } else if (response < 0) {
            printf("[ERROR] Failed to receive audio frame from decoder
");
            return response;
        }
        if (response >= 0) {
            // Do stuff and encode
            if (encodeAudio(video, frame) < 0) {
                printf("[ERROR] Failed to encode new audio
");
                return -1;
            }
        }
        av_frame_unref(frame);
    }
    return 0;
}

int encodeAudio(Video* video, AVFrame* frame) {
    AVPacket* packet = av_packet_alloc();
    if (!packet) {
        printf("[ERROR] Could not allocate memory for audio output packet
");
        return -1;
    }
    int response = avcodec_send_frame(video->audioCodecContext_O, frame);
    if (response < 0) {
        printf("[ERROR] Failed to send audio frame for encoding
");
        return response;
    }
    while (response >= 0) {
        response = avcodec_receive_packet(video->audioCodecContext_O, packet);
        if (response == AVERROR(EAGAIN) || response == AVERROR_EOF) {
            break;
        } else if (response < 0) {
            printf("[ERROR] Failed to receive audio packet from encoder
");
            return response;
        }
        packet->stream_index = video->audioStream;
        video->inputStream = getAudioStream(video);
        video->outputStream = video->outputContext->streams[packet->stream_index];
        packet->pts = av_rescale_q_rnd(packet->pts, video->inputStream->time_base, video->outputStream->time_base, AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX);
        packet->dts = av_rescale_q_rnd(packet->dts, video->inputStream->time_base, video->outputStream->time_base, AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX);
        packet->duration = av_rescale_q(packet->duration, video->inputStream->time_base, video->outputStream->time_base);
        packet->pos = -1;
        //av_packet_rescale_ts(packet, video->inputStream->time_base, video->outputStream->time_base);

        response = av_interleaved_write_frame(video->outputContext, packet);
        if (response < 0) {
            printf("[ERROR] Failed to write audio packet
");
            break;
        }
    }
    av_packet_unref(packet);
    av_packet_free(&packet);
    return 0;
}

int readFrames(Video* video, AVPacket* packet, AVFrame* frame) {
    if (!packet) {
        printf("[ERROR] Packet not allocated to be read
");
        return -1;
    }
    if (!frame) {
        printf("[ERROR] Frame not allocated to be read
");
        return -1;
    }
    if (prepareVideoOutStream(video) < 0) {
        printf("[ERROR] Failed to prepare output video stream
");
        return -1;
    }
    if (prepareAudioOutStream(video) < 0) {
        printf("[ERROR] Failed to prepare output audio stream
");
        return -1;
    }
    int frameNum = 0;
    while (av_read_frame(video->inputContext, packet) >= 0) {
        printf("[READ] Reading frame %i
", frameNum);
        if (packet->stream_index == video->videoStream) {
            if (decodeVideo(video, packet, frame) < 0) {
                printf("[ERROR] Failed to decode and encode video
");
                return -1;
            }
        } else if (packet->stream_index == video->audioStream) {
            if (decodeAudio(video, packet, frame) < 0) {
                printf("[ERROR] Failed to decode and encode audio
");
                return -1;
            }
        }
        av_packet_unref(packet);
        frameNum++;
    }
    // Flush encoder
    encodeVideo(video, NULL);
    encodeAudio(video, NULL);
    av_write_trailer(video->outputContext);
    return 0;
}

My main method that runs all the functions:

int main(int argc, char* argv[]) {
    Video* video = (Video*)malloc(sizeof(Video));
    initVideo(video);
    if (findStreams(video, argv[1], argv[2]) < 0) {
        printf("[ERROR] Could not find streams
");
        return -1;
    }

    AVDictionary* dic = NULL;
    if (avformat_write_header(video->outputContext, &dic) < 0) {
        printf("[ERROR] Error while writing header to output file
");
        return -1;
    }
    AVFrame* frame = av_frame_alloc();
    AVPacket* packet = av_packet_alloc();
    if (readFrames(video, packet, frame) < 0) {
        printf("[ERROR] Failed to read and write new video
");
        return -1;
    }
    freeVideo(video); // Frees all codecs and contexts and the video
    return 0;
}

I tried to lay out my code so that it can be read from top to bottom without needing to scroll up.

I realize that when copying a video, I can just pass the AVPacket to write to the output file, but I wanted to be able to work with the AVFrame in the future, so I wrote it this way. I have a feeling that the issue with the way my audio is behaving is because of the audio output AVCodecContext from the prepareAudioOutStream() function.

Reading the FFmpeg documentation has proved to be of little help with this issue as well as other online sources. I must be missing something (or have something unneeded) so anything that would point me in the right direction would be helpful.

Thank you.

Alexis Nealon · Accepted Answer

I'm an audio engineer, not a coder, but I hope this may be helpful. What may be happening is that your bit depth is being truncated; eg 24 bit audio being truncated to 16 bit, which will sound distorted and noisy. Each bit truncated from the most significant will clip 6dB of headroom. This will increase the noise floor and turn a loud but clear sine wave steadily into a distorted square wave as the significicant bit reduction increases.

Check for bit depth options in the re-encoding process. It may be that your encoder has a limit to its bit depth. Check the source bit depth and the re-encoded bit depth and see what the difference is. You could use VLC media player for this.

It is also recommended that you leave some headroom in the signal before encoding (at least 0.1 dB.) Pre-encoded audio may already be maxed out, and so re-encoding may add some slight distortion.

More info here:

Reducing sample bit-depth by truncating

https://www.apple.com/itunes/docs/apple-digital-masters.pdf

FFmpeg - Audio encoding produces extra noise over audio

Answers (2)

Related Questions