Reputation: 865
I'm working with audio recorded via Quicktime and saved in .m4a format. I'd like to use Google Cloud Provider's Speech API and their recommendations are:
Do:
Use a lossless codec to record and transmit audio. FLAC or LINEAR16 is recommended.
Avoid:
Using mp3, mp4, m4a, mu-law, a-law or other lossy codecs during recording or transmission may reduce accuracy. If your audio is already in an encoding not supported by the API, transcode it to lossless FLAC or LINEAR16. If your application must use a lossy codec to conserve bandwidth, we recommend the AMR_WB, OGG_OPUS or SPEEX_WITH_HEADER_BYTE codecs, in that preferred order.
Source: https://cloud.google.com/speech/docs/best-practices
The API supports FLAC, WAV, or raw and I'm trying to transcode my file into one of these programmatically for use in an application. However, I'm unable to find a good Python library to do this.
UPDATE: Here's the answer: https://www.ffmpeg.org/ (not python, but for sure the most comprehensive tool out there)
Upvotes: 7
Views: 16425
Reputation: 418
I am using a python library call pydub: pydub github link They were built upon ffmpeg
Upvotes: 9