Reputation: 5386
I am trying to load a .wav
file in Python using the scipy folder. My final objective is to create the spectrogram of that audio file. The code for reading the file could be summarized as follows:
import scipy.io.wavfile as wav
(sig, rate) = wav.read(_wav_file_)
For some .wav
files I am receiving the following error:
WavFileWarning: Chunk (non-data) not understood, skipping it. WavFileWarning) ** ValueError: Incomplete wav chunk.
Therefore, I decided to use librosa for reading the files using the:
import librosa
(sig, rate) = librosa.load(_wav_file_, sr=None)
That is working properly for all cases, however, I noticed a difference in the colors of the spectrogram. While it was the same exact figure, however, somehow the colors were inversed. More specifically, I noticed that when keeping the same function for calculation of the specs and changing only the way I am reading the .wav
there was this difference. Any idea what can produce that thing? Is there a default difference between the way the two approaches read the .wav
file?
EDIT:
(rate1, sig1) = wav.read(spec_file) # rate1 = 16000
sig, rate = librosa.load(spec_file) # rate 22050
sig = np.array(α*sig, dtype = "int16")
Something that almost worked is to multiple the result of sig with a constant α
alpha that was the scale between the max values of the signal from scipy wavread and the signal derived from librosa. Still though the signal rates were different.
Upvotes: 9
Views: 25694
Reputation: 862
To add on to what has been said, Librosa has a utility to convert integer arrays to floats.
float_audio = librosa.util.buf_to_float(sig)
I use this to great success when producing spectrograms of Pydub audiosegments. Keep in mind, one of its arguments is the number of bytes per sample. It defaults to 2. You can read about it more in the documentation here. Here is the source code:
def buf_to_float(x, n_bytes=2, dtype=np.float32):
"""Convert an integer buffer to floating point values.
This is primarily useful when loading integer-valued wav data
into numpy arrays.
See Also
--------
buf_to_float
Parameters
----------
x : np.ndarray [dtype=int]
The integer-valued data buffer
n_bytes : int [1, 2, 4]
The number of bytes per sample in `x`
dtype : numeric type
The target output type (default: 32-bit float)
Returns
-------
x_float : np.ndarray [dtype=float]
The input data buffer cast to floating point
"""
# Invert the scale of the data
scale = 1./float(1 << ((8 * n_bytes) - 1))
# Construct the format string
fmt = '<i{:d}'.format(n_bytes)
# Rescale and format the data buffer
return scale * np.frombuffer(x, fmt).astype(dtype)
Upvotes: 0
Reputation: 1032
This sounds like a quantization problem. If samples in the wave file are stored as float
and librosa is just performing a straight cast to an int
, and value less than 1 will be truncated to 0. More than likely, this is why sig
is an array of all zeros. The float
must be scaled to map it into range of an int
. For example,
>>> a = sp.randn(10)
>>> a
array([-0.04250369, 0.244113 , 0.64479281, -0.3665814 , -0.2836227 ,
-0.27808428, -0.07668698, -1.3104602 , 0.95253315, -0.56778205])
Convert a to type int
without scaling
>>> a.astype(int)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Convert a to int
with scaling for 16-bit integer
>>> b = (a* 32767).astype(int)
>>> b
array([ -1392, 7998, 21127, -12011, -9293, -9111, -2512, -42939,
31211, -18604])
Convert scaled int
back to float
>>> c = b/32767.0
>>> c
array([-0.04248177, 0.24408704, 0.64476455, -0.36655782, -0.28360851,
-0.27805414, -0.0766625 , -1.31043428, 0.9525132 , -0.56776635])
c
and b
are only equal to about 3 or 4 decimal places due to quantization to int
.
If librosa is returning a float
, you can scale it by 2**15
and cast it to an int
to get same range of values that scipy wave reader is returning. Since librosa is returning a float
, chances are the values going to lie within a much smaller range, such as [-1, +1]
, than a 16-bit integer which will be in [-32768, +32767]
. So you need to scale one to get the ranges to match. For example,
sig, rate = librosa.load(spec_file, mono=True)
sig = sig × 32767
Upvotes: 10
Reputation: 3461
If you yourself do not want to do the quantization, then you could use pylab
using the pylab.specgram
function, to do it for you. You can look inside the function and see how it uses vmin
and vmax
.
It is not completely clear from your post (at least for me) what you want to achieve (as there is also neither a sample input file nor any script beforehand from you). But anyways, to check if the spectrogram of a wave file has significant differences depending on the case that the signal data returned from any of the read functions is float32
or int
, I tested the following 3 functions.
_wav_file_ = "africa-toto.wav"
def spectogram_librosa(_wav_file_):
import librosa
import pylab
import numpy as np
(sig, rate) = librosa.load(_wav_file_, sr=None, mono=True, dtype=np.float32)
pylab.specgram(sig, Fs=rate)
pylab.savefig('spectrogram3.png')
def graph_spectrogram_wave(wav_file):
import wave
import pylab
def get_wav_info(wav_file):
wav = wave.open(wav_file, 'r')
frames = wav.readframes(-1)
sound_info = pylab.fromstring(frames, 'int16')
frame_rate = wav.getframerate()
wav.close()
return sound_info, frame_rate
sound_info, frame_rate = get_wav_info(wav_file)
pylab.figure(num=3, figsize=(10, 6))
pylab.title('spectrogram pylab with wav_file')
pylab.specgram(sound_info, Fs=frame_rate)
pylab.savefig('spectrogram2.png')
def graph_wavfileread(_wav_file_):
import matplotlib.pyplot as plt
from scipy import signal
from scipy.io import wavfile
import numpy as np
sample_rate, samples = wavfile.read(_wav_file_)
frequencies, times, spectrogram = signal.spectrogram(samples,sample_rate,nfft=1024)
plt.pcolormesh(times, frequencies, 10*np.log10(spectrogram))
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.savefig("spectogram1.png")
spectogram_librosa(_wav_file_)
#graph_wavfileread(_wav_file_)
#graph_spectrogram_wave(_wav_file_)
which apart from the minor differences in size and intensity seem quite similar, no matter the read method, library or data type, which makes me question a little, for what purpose need the outputs be 'exactly' same and how exact should they be.
librosa.load()
function offers a dtype
parameter but works anyways only with float
values. Googling in this regard led to me to only this issue which wasn't much help and this issue says that that's how it will stay with librosa, as internally it seems to only use floats.Upvotes: 6