Reputation: 185

load directly an audio file with librosa in dB

Is there a way to directly load an audio file with librosa in dB instead of amplitude as obtained by:

y, sr = librosa.load(filename, sr=None)

Upvotes: 2

Answers (2)

Abhi25t

Reputation: 4643

We need the Fourier transform first :

import numpy as np

dft_input = y

window = np.hanning(len(dft_input))
windowed_input = dft_input * window
dft = np.fft.fft(windowed_input)

The output of the DFT is an array of complex numbers, made up of real and imaginary components. Taking the magnitude with np.abs(dft) extracts the amplitude information

amplitude = np.abs(dft)

The decibel scale for real-world audio starts at 0 dB, which represents the quietest possible sound humans can hear, and louder sounds have larger values. However, for digital audio signals, 0 dB is the loudest possible amplitude, while all other amplitudes are negative. So, we need to provide a reference point for maximum amplitude.

amplitude_db = librosa.amplitude_to_db(amplitude, ref=np.max)

Now, we can plot using matplotlib.

import matplotlib.pyplot as plt

max_time = y.size/sr
time_steps = np.linspace(0, max_time, y.size)

plt.plot(time_steps, amplitude_db)
plt.show()

Note -

The max value of amplitude_db is 0 and actual sound values will be in negative. If you do not like that you can add np.max(y) to these but that may not necessarily correspond to human-related values.
if you use librosa for display, it will show y-axis values on both sides with cutoffs at +/-80. Somewhat meaningless.

librosa.display.waveshow(amplitude_db, sr=sr)

Upvotes: 0

Anil_M

Reputation: 11453

librosa as mentioned in this paper, pulls audio file as one dimensional numpy array.

from the documentation:

An audio signal is represented as a one-dimensional numpy array, denoted as y throughout librosa. Typically the signal y is accompanied by the sampling rate (denoted sr ) which denotes the frequency (in Hz) at which values of y are sampled.

From the code:

>>> type(y)
<type 'numpy.ndarray'>
>>> y
array([-0.00265948, -0.0045677 , -0.00412048, ..., -0.00179085,
       -0.00228079, -0.00238096], dtype=float32)
>>>

librosa makes use of The array elements of y and sampling rate for its calculations and representation.

You may need to elaborate as to "load directly an audio file with librosa in dB" and it's intended purpose.

Upvotes: 3

load directly an audio file with librosa in dB

Answers (2)

Related Questions