Moeiz Riaz
Moeiz Riaz

Reputation: 305

Create a 44-byte header with ffmpeg

I made a program using ffmpeg libraries that converts an audio file to a wav file. Except the only problem is that it doesn't create a 44-byte header. When input the file into Kaldi Speech Recognition, it produces the error:

ERROR (online2-wav-nnet2-latgen-faster:Read4ByteTag():wave-reader.cc:74) WaveData: expected 4-byte chunk-name, got read errror

I ran the file thru shntool and it reports a 78-byte header. Is there anyway I can get the standard 44-byte header using ffmpeg libraries?

Upvotes: 0

Views: 1166

Answers (1)

Moeiz Riaz
Moeiz Riaz

Reputation: 305

FFmpeg inserts some metadata about the encoder into the header file. Here is the hexdump of the header before the fix:

00000000 52 49 46 46 06 90 00 00 57 41 56 45 66 6d 74 20 |RIFF....WAVEfmt | 00000010 10 00 00 00 01 00 01 00 40 1f 00 00 80 3e 00 00 |........@....>..| 00000020 02 00 10 00 4c 49 53 54 1a 00 00 00 49 4e 46 4f |....LIST....INFO| 00000030 49 53 46 54 0e 00 00 00 4c 61 76 66 35 36 2e 33 |ISFT....Lavf56.3| 00000040 36 2e 31 30 30 00 64 61 74 61 c0 8f 00 00 00 00 |6.100.data......|

as you can see Lavf56.36.100 is the encoder in the header. Here is the portion of code that I used to get rid of it.

std::cout<<"------------------BEFORE-----------------------"<<std::endl;
std::cout<< av_dict_count ( (*ofmt_ctx)->metadata) <<std::endl;
std::cout<<"-------------------------------------------"<<std::endl; 
if(av_dict_set(&(*ofmt_ctx)->metadata,"ISFT",NULL, AV_DICT_IGNORE_SUFFIX)){
 std::cerr<<"Nope it, didn't work :("<<std::endl;
}

ret = avformat_write_header(*ofmt_ctx,&(*ofmt_ctx)->metadata );
if (ret < 0) {
  std::cout<<"-------------------------------------------"<<std::endl;
  av_log(NULL, AV_LOG_ERROR, "Error occurred when writing header to file\n");
  return ret;
}
std::cout<<"------------------AFTER-----------------------"<<std::endl;
std::cout<< av_dict_count ( (*ofmt_ctx)->metadata) <<std::endl;
std::cout<<"-------------------------------------------"<<std::endl;

Here is the hexdump afterwards: 00000000 52 49 46 46 e4 8f 00 00 57 41 56 45 66 6d 74 20 |RIFF....WAVEfmt | 00000010 10 00 00 00 01 00 01 00 40 1f 00 00 80 3e 00 00 |........@....>..| 00000020 02 00 10 00 64 61 74 61 c0 8f 00 00 00 00 00 00 |....data........| 00000030 00 00 00 00 00 00 00 00 ff ff 00 00 00 00 00 00 |................|

shntool now report 44-bytes

(NOTE:ofmt_ctx was a ** in this function that I made, hence why referencing the metadata dictionary as &(*ofmt_ctx)->metadata)

Upvotes: 1

Related Questions