Difference in audio sample size between AVCodecContext-frame_size and AVFrame-nb_samples

Question

I am playing with ffmpeg to understand audio data, but I see there is a difference between audio data, AVCodecContext->frame_size shows it to be 1152, but the value I get fromAVFrame->nb_samples shows it to be 47. Both data fields describes the same thing i.e no of samples in an audio frame per channel, then why is there a difference. For reference I m pasting the AVFrame and AVCodecContext Object, which are huge, but also it will give you any information you wanted

AVCodecContext

{av_class = 0x7ffff74e24c0, log_level_offset = 0, codec_type = AVMEDIA_TYPE_AUDIO, codec = 0x7ffff751c600, codec_id = AV_CODEC_ID_MP3, codec_tag = 0, priv_data = 0x5555555ac040, 
  internal = 0x5555555a0e40, opaque = 0x0, bit_rate = 96000, bit_rate_tolerance = 0, global_quality = 0, compression_level = -1, flags = 0, flags2 = 0, extradata = 0x0, extradata_size = 0, time_base = {
    num = 1, den = 32000}, ticks_per_frame = 1, delay = 0, width = 0, height = 0, coded_width = 0, coded_height = 0, gop_size = 0, pix_fmt = AV_PIX_FMT_NONE, draw_horiz_band = 0x0, get_format = 
    0x7ffff69ae940 , max_b_frames = 0, b_quant_factor = 0, b_frame_strategy = 0, b_quant_offset = 0, has_b_frames = 0, mpeg_quant = 0, i_quant_factor = 0, 
  i_quant_offset = 0, lumi_masking = 0, temporal_cplx_masking = 0, spatial_cplx_masking = 0, p_masking = 0, dark_masking = 0, slice_count = 0, prediction_method = 0, slice_offset = 0x0, 
  sample_aspect_ratio = {num = 0, den = 1}, me_cmp = 0, me_sub_cmp = 0, mb_cmp = 0, ildct_cmp = 0, dia_size = 0, last_predictor_count = 0, pre_me = 0, me_pre_cmp = 0, pre_dia_size = 0, 
  me_subpel_quality = 0, me_range = 0, slice_flags = 0, mb_decision = 0, intra_matrix = 0x0, inter_matrix = 0x0, scenechange_threshold = 0, noise_reduction = 0, intra_dc_precision = 0, skip_top = 0, 
  skip_bottom = 0, mb_lmin = 0, mb_lmax = 0, me_penalty_compensation = 0, bidir_refine = 0, brd_scale = 0, keyint_min = 0, refs = 0, chromaoffset = 0, mv0_threshold = 0, b_sensitivity = 0, 
  color_primaries = AVCOL_PRI_RESERVED0, color_trc = AVCOL_TRC_RESERVED0, colorspace = AVCOL_SPC_RGB, color_range = AVCOL_RANGE_UNSPECIFIED, chroma_sample_location = AVCHROMA_LOC_UNSPECIFIED, 
  slices = 0, field_order = AV_FIELD_UNKNOWN, sample_rate = 32000, channels = 2, sample_fmt = AV_SAMPLE_FMT_FLTP, frame_size = 1152, frame_number = 1, block_align = 0, cutoff = 0, channel_layout = 3, 
  request_channel_layout = 0, audio_service_type = AV_AUDIO_SERVICE_TYPE_MAIN, request_sample_fmt = AV_SAMPLE_FMT_NONE, get_buffer2 = 0x7ffff69af1d0 , 
  refcounted_frames = 0, qcompress = 0, qblur = 0, qmin = 0, qmax = 0, max_qdiff = 0, rc_buffer_size = 0, rc_override_count = 0, rc_override = 0x0, rc_max_rate = 0, rc_min_rate = 0, 
  rc_max_available_vbv_use = 0, rc_min_vbv_overflow_use = 0, rc_initial_buffer_occupancy = 0, coder_type = 0, context_model = 0, frame_skip_threshold = 0, frame_skip_factor = 0, frame_skip_exp = 0, 
  frame_skip_cmp = 0, trellis = 0, min_prediction_order = -1, max_prediction_order = -1, timecode_frame_start = 0, rtp_callback = 0x0, rtp_payload_size = 0, mv_bits = 0, header_bits = 0, 
  i_tex_bits = 0, p_tex_bits = 0, i_count = 0, p_count = 0, skip_count = 0, misc_bits = 0, frame_bits = 0, stats_out = 0x0, stats_in = 0x0, workaround_bugs = 0, strict_std_compliance = 0, 
  error_concealment = 0, debug = 0, err_recognition = 0, reordered_opaque = -9223372036854775808, hwaccel = 0x0, hwaccel_context = 0x0, error = {0, 0, 0, 0, 0, 0, 0, 0}, dct_algo = 0, idct_algo = 0, 
  bits_per_coded_sample = 0, bits_per_raw_sample = 0, lowres = 0, coded_frame = 0x0, thread_count = 1, thread_type = 3, active_thread_type = 0, thread_safe_callbacks = 0, execute = 
    0x7ffff6e7bc30 , execute2 = 0x7ffff6e7bd00 , nsse_weight = 0, profile = -99, level = -99, skip_loop_filter = AVDISCARD_DEFAULT, 
  skip_idct = AVDISCARD_DEFAULT, skip_frame = AVDISCARD_DEFAULT, subtitle_header = 0x0, subtitle_header_size = 0, vbv_delay = 0, side_data_only_packets = 1, initial_padding = 0, framerate = {num = 0, 
    den = 1}, sw_pix_fmt = AV_PIX_FMT_NONE, pkt_timebase = {num = 0, den = 1}, codec_descriptor = 0x7ffff74fc910, pts_correction_num_faulty_pts = 0, pts_correction_num_faulty_dts = 0, 
  pts_correction_last_pts = 0, pts_correction_last_dts = 0, sub_charenc = 0x0, sub_charenc_mode = 0, skip_alpha = 0, seek_preroll = 0, debug_mv = 0, chroma_intra_matrix = 0x0, dump_separator = 0x0, 
  codec_whitelist = 0x0, properties = 0, coded_side_data = 0x0, nb_coded_side_data = 0, hw_frames_ctx = 0x0, sub_text_format = 0, trailing_padding = 0, max_pixels = 2147483647, hw_device_ctx = 0x0, 
  hwaccel_flags = 0, apply_cropping = 0, extra_hw_frames = 0, discard_damaged_percentage = 0}

AVFrame

{data = {
    0x5555555c2b00 "\373c;\270\203\001?\271\016[
\271\236\222\343\071\310\322H8tBs\271\371о9Q[\017\271\314\330̹ҍ\325\071\a\340\315\270\224v\245\271\322\002\067:\265\240a\271ڱҹle.9r\247\232\070L\344\362\270\360\370\376\270ñ@9H", 
    0x5555555b5e00 "Q
_\270\345\375\006\271\375\204S\271\206\301\235\071\021\373\202\071.V\352\270*2c\271\373\334x9\244\274\003\271*+\266\271!z\216\071xw\303\067\063\302ĸh\212\354\070\216\240	\271\233\365ǹ\271\235\360\267\355'\f9\345\021	\271\066\372\b8/\"\361\070\215", 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, linesize = {4608, 0, 0, 0, 0, 0, 0, 0}, extended_data = 0x5555555a2640, width = 0, height = 0, 
  nb_samples = 47, format = 8, key_frame = 1, pict_type = AV_PICTURE_TYPE_NONE, sample_aspect_ratio = {num = 0, den = 1}, pts = 0, pkt_pts = 0, pkt_dts = 0, coded_picture_number = 0, 
  display_picture_number = 0, quality = 0, opaque = 0x0, error = {0, 0, 0, 0, 0, 0, 0, 0}, repeat_pict = 0, interlaced_frame = 0, top_field_first = 0, palette_has_changed = 0, 
  reordered_opaque = -9223372036854775808, sample_rate = 32000, channel_layout = 3, buf = {0x5555555a3d80, 0x55555559fe40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, extended_buf = 0x0, nb_extended_buf = 0, 
  side_data = 0x0, nb_side_data = 0, flags = 0, color_range = AVCOL_RANGE_UNSPECIFIED, color_primaries = AVCOL_PRI_RESERVED0, color_trc = AVCOL_TRC_RESERVED0, colorspace = AVCOL_SPC_RGB, 
  chroma_location = AVCHROMA_LOC_UNSPECIFIED, best_effort_timestamp = 0, pkt_pos = 5412, pkt_duration = 508032, metadata = 0x0, decode_error_flags = 0, channels = 2, pkt_size = 432, qscale_table = 0x0, 
  qstride = 0, qscale_type = 0, qp_table_buf = 0x0, hw_frames_ctx = 0x0, opaque_ref = 0x0, crop_top = 0, crop_bottom = 0, crop_left = 0, crop_right = 0, private_ref = 0x0}

This is how I m decoding audio frames

while(av_read_frame(avc,avpacket) == 0){
        if(avpacket->stream_index == audioStream){
            int res = avcodec_send_packet(avctx,avpacket);
            if(res == AVERROR(EAGAIN)){
                continue;
            }
            else if(res<0){
                std::cout<<"error reading packet
";
                break;
            }
            else{
                AVFrame* avframe = av_frame_alloc();
                res = avcodec_receive_frame(avctx,avframe);
                if(res == AVERROR(EAGAIN)){
                    continue;
                }
                else if(res<0){
                    std::cout<<"Error reading frame";
                    break;
                }
                
                else{
                    audio_buffer.push(*avframe);
                }
                av_frame_free(&avframe);
            }
        }
        av_packet_unref(avpacket);
    }

Gyan · Accepted Answer

For MP3 the first 1105 samples are decoder delay, leaving 47 samples returned out of 1152. See How to compute the number of extra samples added by LAME or FFMPEG

Difference in audio sample size between AVCodecContext->frame_size and AVFrame->nb_samples

Answers (1)

Related Questions

Difference in audio sample size between AVCodecContext-&gt;frame_size and AVFrame-&gt;nb_samples

Answers (1)

Related Questions

Difference in audio sample size between AVCodecContext->frame_size and AVFrame->nb_samples