Vani N
Vani N

Reputation: 11

librosa.load() not accurately decoding audio file from Youtube

I am trying to run the command librosa.load() on a .wav file. The .wav file was downloaded from a youtube video via youtube-dl and has the following properties:

However, the returned time series from the command librosa.load('file.wav') is the following:

(array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), 22050)

The .wav file definitely has lots of noise so I don't quite understand why the output is 0 for every frame.

I also tried running librosa.load() on other .wav files of audio from other Youtube videos and had the same result.

If anyone has any idea about what is causing this output please let me now. Thanks in advance.

Upvotes: 1

Views: 4923

Answers (1)

Jon Nordby
Jon Nordby

Reputation: 6259

Youtube does not store WAV files. So when asking youtube-dl for WAV, all it does is convert MPEG4/OPUS into a WAV file after downloading (using ffmpeg). This will not increase quality, just take up a lot more space on harddisk. It is possible that something went wrong in this conversion that caused your file to just get filled with silence.

Since librosa can load MPEG4 audio (when suitable dependencies are installed), I would recommend downloading that, and loading it directly. See example code below.

Download MP4 audio using youtube-dl

youtube-dl -ci -f "bestaudio[ext=m4a]" https://www.youtube.com/watch?v=-5FKNViujeM -o '%(id)s.mp4'

Load the file in Python

import librosa
import numpy

sr = 16000
y, _ = librosa.load('-5FKNViujeM.mp4', duration=7, sr=sr) # load first seconds

# Calculate RMS
rms_window = 1.0 # in seconds 
rms = librosa.feature.rms(y=y, hop_length=int(sr*rms_window))
rms_db = librosa.core.amplitude_to_db(rms, ref=0.0)
print(list(rms_db[0]))

Output

[1.9082947, 79.42775, 77.47075, 80.536514, 81.758995, 81.908295, 78.63311, 80.665535]

Upvotes: 7

Related Questions