Mark K
Mark K

Reputation: 9348

Librosa to get basic parameters of audio

In getting basic parameters of an audio file, by Wave:

import wave

data = wave.open('c:\\sample.wav', mode = 'rb')
params = data.getparams()
print params

It returns:

(1, 2, 4000, 160000, 'NONE', 'not compressed')

That's for: nchannels=1, sampwidth=2, framerate=16000, nframes=47104, comptype='NONE', compname='not compressed

I assume Librosa has similar functions but no find-outs after searches.

Does Librosa have commands to produce similar results?

Thank you.

Upvotes: 6

Views: 6538

Answers (2)

Gianluca Micchi
Gianluca Micchi

Reputation: 1653

librosa does not have a specific function to load the metadata of an audio file as of version 0.10.1. There are a couple of functions for getting individual pieces of information, namely get_samplerate(path) and get_duration(*, path, ...) (which returns the duration in seconds and not in samples), but that's it.

To get the number of channels you must load the file and then inspect the shape of the audio. However, librosa does a lot of things under the hood when you load a file: by default, it will convert the signal to single channel and 22,050 Hz. If you want to read the file as it is stored on disk you must read it this way:

import librosa as lr

audio_fp = "path/to/audio/file"
duration = lr.get_duration(path=audio_fp)
audio, sr = lr.load(audio_fp, sr=None, mono=False, duration=0.01)
# n_samples is approximate, but should be correct most of the times
n_samples = int(round(duration * sr))
if audio.ndim == 1:
  channels = 1
else:
  channels = audio.shape[0]

The duration keyword asks librosa to read only the first 0.01 seconds of the audio, which is enough to find out the number of channels. This can enormously speed up the process, especially when you are dealing with long files.

Some %timeit magic shows that the wave method takes ~26 us, while the librosa stack takes ~99 us. Not using the duration keyword on a 60-minute track requires ~221 ms (that's milliseconds, not microseconds — more than a thousand times slower). An advantage of librosa is that it can transparently deal with encoded files such as mp3 and flac with only a minor hit in performance (~125 us).

Upvotes: 1

Julian Fortune
Julian Fortune

Reputation: 180

Librosa Core has some of the functionality you're looking for. librosa.core.load will load a file like wave, but will not give as detailed information.

import librosa

# Load an example file in .ogg format
fileName = librosa.util.example_audio_file()
audioData, sampleRate = librosa.load(fileName)

print(audioData)
>>> [ -4.756e-06,  -6.020e-06, ...,  -1.040e-06,   0.000e+00]

print(audioData.shape)
>>> (1355168,)

print(sampleRate)
>>> 22050

The shape of audioData will tell you the number of channels. A shape like (n,) is mono, and (2, n) is stereo. The n in the shape is the length of the audio in samples. If you want the length in seconds check out librosa.core.get_duration.

Like @hendrick mentions in his comment, the Librosa advanced I/O page says librosa uses soundfile and audioread for audio I/O, and the load source code shows it's just wrapping around those libraries.

However, there shouldn't be any issue with using wave for loading the audio file and librosa for analysis as long as you follow the librosa API. Is there a particular problem you're having, or goal you need to achieve?

Upvotes: 7

Related Questions