Reputation: 11
I am trying to run the command librosa.load()
on a .wav
file. The .wav file
was downloaded from a youtube video via youtube-dl and has the following properties:
However, the returned time series from the command librosa.load('file.wav')
is the following:
(array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), 22050)
The .wav
file definitely has lots of noise so I don't quite understand why the output is 0
for every frame.
I also tried running librosa.load()
on other .wav
files of audio from other Youtube videos and had the same result.
If anyone has any idea about what is causing this output please let me now. Thanks in advance.
Upvotes: 1
Views: 4923
Reputation: 6259
Youtube does not store WAV files. So when asking youtube-dl for WAV, all it does is convert MPEG4/OPUS into a WAV file after downloading (using ffmpeg). This will not increase quality, just take up a lot more space on harddisk. It is possible that something went wrong in this conversion that caused your file to just get filled with silence.
Since librosa can load MPEG4 audio (when suitable dependencies are installed), I would recommend downloading that, and loading it directly. See example code below.
Download MP4 audio using youtube-dl
youtube-dl -ci -f "bestaudio[ext=m4a]" https://www.youtube.com/watch?v=-5FKNViujeM -o '%(id)s.mp4'
Load the file in Python
import librosa
import numpy
sr = 16000
y, _ = librosa.load('-5FKNViujeM.mp4', duration=7, sr=sr) # load first seconds
# Calculate RMS
rms_window = 1.0 # in seconds
rms = librosa.feature.rms(y=y, hop_length=int(sr*rms_window))
rms_db = librosa.core.amplitude_to_db(rms, ref=0.0)
print(list(rms_db[0]))
Output
[1.9082947, 79.42775, 77.47075, 80.536514, 81.758995, 81.908295, 78.63311, 80.665535]
Upvotes: 7