Reputation: 1589
I'm a python newbie and audio analysis newbie. If this is not the right place for this question, please point me to right place.
I have an mp3 audio file which has just silence.
Converted to .wav using sox
sox input.mp3 output.wav
from scipy.io.wavfile import read
import matplotlib.pyplot as plt
(fs,x)=read('/home/vivek/Documents/VivekProjects/Silence/silence.wav')
##plt.rcParams['agg.path.chunksize'] = 5000 # for preventing overflow error.
fs
x.size/float(fs)
plt.plot(x)
Which generates this image:
I also used solution to this question: How to plot a wav file
from scipy.io.wavfile import read
import matplotlib.pyplot as plt
# read audio samples
from scipy.io.wavfile import read
import matplotlib.pyplot as plt
# read audio samples
input_data = read("/home/vivek/Documents/VivekProjects/Silence/silence.wav")
audio = input_data[1]
# plot the first 1024 samples
plt.plot(audio)
# label the axes
plt.ylabel("Amplitude")
plt.xlabel("Time")
# set the title
plt.title("Sample Wav")
# display the plot
plt.show()
Which generated this image:
Question: I want to know how to interpret the different color bars(blue green,yellow) in the chart. If you listen to the file it is only silence, and I expected to see just a flat line if anything.
My mp3 file can be downloaded from here.
The sox converted wav file can be found here.
Even though the file is silent, even dropbox is generating a waveform. I can't seem to figure out why.
Upvotes: 1
Views: 1596
Reputation: 11473
I had a suspicion that your silence.mp3
file had audio which was very low (below human hearing) since I couldn't hear it even when I played at maximum speaker sound.
So, I came across plotting audio frequency from mp3
from here
first we convert mp3
audio to wav
. As parent file is stero, converted wav
file is stereo as well. In order to demonstrate that there are audio frequencies , we just need single channel.
Once we have single channel wav
audio, we then simply plot frequency
against time
index with a color-bar of dB
power level.
import scipy.io.wavfile
from pydub import AudioSegment
import matplotlib.pyplot as plt
import numpy as np
from numpy import fft as fft
#read mp3 file
mp3 = AudioSegment.from_mp3("silence.mp3")
#convert to wav
mp3.export("silence.wav", format="wav")
#read wav file
rate,audData=scipy.io.wavfile.read("silence.wav")
#if stereo grab both channels
channel1=audData[:,0] #left
#channel2=audData[:,1] #right channel, we dont need here
#create a time variable in seconds
time = np.arange(0, float(audData.shape[0]), 1) / rate
#Plot spectrogram of frequency vs time
plt.figure(1, figsize=(8,6))
plt.subplot(211)
Pxx, freqs, bins, im = plt.specgram(channel1, Fs=rate, NFFT=1024, cmap=plt.get_cmap('autumn_r'))
cbar=plt.colorbar(im)
plt.xlabel('Time (s)')
plt.ylabel('Frequency (Hz)')
cbar.set_label('Intensity dB')
plt.show()
As you can see in the image , the silence.mp3
does contain audio frequencies possible with power level of -30 to -45 dB.
Upvotes: 1
Reputation: 484
First, always check the shape of your data before plotting.
x.shape
## (3479040, 2)
So the 2 here means you have two channel in your .wav file, matplotlib by default plot them in different colors. You will need to slice the matrix by row in this situation.
import matplotlib.pyplot as plt
ind = int(fs * 0.5) ## plot first 500ms
### plot as time series
plt.plot(x[:ind,:])
plt.figure()
#### Visualise distribution
plt.hist(x[:ind,0],bins = 10)
plt.gca().set_yscale('log')
#####
print x.min(),x.max()
#### -3 3
As can be seen from the graph, the signal is of very low absolute value (-3,3). Depending on the encoding of .wav file (integer or float), it will be translated to amplitude (but probably a very low amplitude, that's why it's silent).
I my self is not familiar with the precise encoding. But this page might help: http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html
- For all formats other than PCM, the Format chunk must have an extended portion. The extension can be of zero length, but the size field (with value 0) must be present.
- For float data, full scale is 1. The bits/sample would normally be 32 or 64.
- For the log-PCM formats (µ-law and A-law), the Rev. 3 documentation indicates that the bits/sample field (wBitsPerSample) should be set to 8 bits.
- The non-PCM formats must have a fact chunk.
PS: if you want to start some more advanced audio analysis, do check this workshop which I found super practical, especially the Energy part and FFT part.
Upvotes: 3