Reputation: 11
I loaded mp3 file in python with torchaudio
and librosa
import torchaudio
import librosa
filename='example.mp3'
array_tor, sample_rate_tor = torchaudio.load(filename,format='mp3')
array_lib, sample_rate_lib = librosa.load(filename, sr=sample_rate_tor)
print( len(array_tor.numpy()[0]) , len(array_lib)) # get different value
the length of two arrays are different, why makes them different, and how to make them in same?
if I convert example.mp3 to wav file with
from pydub import AudioSegment
audSeg = AudioSegment.from_mp3('example.mp3')
audSeg.export('example.wav', format="wav")
and load wav file with torchaudio
, librosa
, soundfile
import torchaudio
import librosa
import soundfile as sf
filename='example.wav'
array_tor_w, sample_rate_tor_w = torchaudio.load(filename,format='wav')
array_lib_w, sample_rate_lib_w = librosa.load(filename, sr=sample_rate_tor_w)
array_sfl_w, sample_rate_sfl_w = sf.read(filename)
print( len(array_tor_w.numpy()[0]) , len(array_lib_w), len(array_sfl_w)) # get same value
the three array length and content are same and also same as len(array_lib)
in mp3 file.
it seems the torchaudio.load()
is special in mp3 file.
Upvotes: 1
Views: 3993
Reputation: 441
This is due to the underlying decoder library torchaudio uses.
Up util v0.11, torchaudio used libmad, which does not remove the extra padding when decoding MP3.
See https://github.com/pytorch/audio/issues/1500 for the detail.
In v0.12, torchaudio switched MP3 decoder to FFmpeg, and the padding issue should be resolved.
Upvotes: 3