Reputation: 185
Is there a way to directly load an audio file with librosa in dB instead of amplitude as obtained by:
y, sr = librosa.load(filename, sr=None)
Upvotes: 2
Views: 1034
Reputation: 4643
We need the Fourier transform first :
import numpy as np
dft_input = y
window = np.hanning(len(dft_input))
windowed_input = dft_input * window
dft = np.fft.fft(windowed_input)
The output of the DFT is an array of complex numbers, made up of real and imaginary components. Taking the magnitude with np.abs(dft)
extracts the amplitude information
amplitude = np.abs(dft)
The decibel scale for real-world audio starts at 0 dB, which represents the quietest possible sound humans can hear, and louder sounds have larger values. However, for digital audio signals, 0 dB is the loudest possible amplitude, while all other amplitudes are negative. So, we need to provide a reference point for maximum amplitude.
amplitude_db = librosa.amplitude_to_db(amplitude, ref=np.max)
Now, we can plot using matplotlib
.
import matplotlib.pyplot as plt
max_time = y.size/sr
time_steps = np.linspace(0, max_time, y.size)
plt.plot(time_steps, amplitude_db)
plt.show()
Note -
The max value of amplitude_db
is 0 and actual sound values will be in negative. If you do not like that you can add np.max(y)
to these but that may not necessarily correspond to human-related values.
if you use librosa for display, it will show y-axis values on both sides with cutoffs at +/-80. Somewhat meaningless.
librosa.display.waveshow(amplitude_db, sr=sr)
Upvotes: 0
Reputation: 11453
librosa
as mentioned in this paper, pulls audio file as one dimensional numpy array
.
from the documentation:
An audio signal is represented as a one-dimensional numpy array, denoted as y throughout
librosa
. Typically the signal y is accompanied by the sampling rate (denoted sr ) which denotes the frequency (in Hz) at which values of y are sampled.
From the code:
>>> type(y)
<type 'numpy.ndarray'>
>>> y
array([-0.00265948, -0.0045677 , -0.00412048, ..., -0.00179085,
-0.00228079, -0.00238096], dtype=float32)
>>>
librosa
makes use of The array elements of y
and sampling rate for its calculations and representation.
You may need to elaborate as to "load directly an audio file with librosa
in dB" and it's intended purpose.
Upvotes: 3