Reputation: 33410
Currently, I am trying to load 280,000 MP3 audio files in Python where the average duration of files is ~5 seconds. I am using Librosa for this purpose as well as for the further processing (e.g. computing spectrogram) in later stages.
However, I realized that loading the files is very slow, as on average it takes 370 milliseconds for each file to be loaded, uncompressed and re-sampled. If I turn off the re-sampling (i.e. librosa.load(..., sr=None)
), it takes around 200 milliseconds but that's not still good considering the large number of files I have. Unsurprisingly, loading wav files without re-sampling is very fast (< 1 ms); but if we perform the re-sampling, it takes around 160 milliseconds.
Now I was wondering if there is any faster approach for doing this, whether directly in Python or using external tools in Linux with the condition that I can later load the results back to Python.
By the way, I have tried using multiprocessing
with a pool of size 4 and achieved 2-3x speed-up, but I am looking for more (preferably > 10x).
Note: the original files are human voice and have a sample rate of 48KHz and a bit-rate of 64 Kbps; I want to downsample them to 16KHz.
Upvotes: 1
Views: 4139
Reputation: 5310
You could use pysox.
It's a thin Python wrapper around SoX, "the Swiss Army knife of sound processing programs."
Note: For faster processing (avoiding exec calls), you may also install and use soxbindings. All you need to do is to replace
import sox
with
import soxbindings as sox
Upvotes: 7