Reputation: 2449
Using Python 3.7 and Tensorflow 2.0, I'm having a hard time reading wav files from the UrbanSounds dataset. This question and answer are helpful because they explain that the input has to be a string tensor, but it seems to be having a hard time getting past the initial metadata encoded in the file, and getting to the real data. Do I have to preprocess the string before being able to load it as a float32 tensor? I already had to preprocess the data by downsampling it from 24-bit wav to 16-bit wav, so the data-input pipeline is turning out to be much more cumbersome than I would have expected. The required downsampling is particularly frustrating. Here's what I'm trying so far:
import tensorflow as tf # this is TensorFlow 2.0
path_to_wav_file = '/mnt/d/Code/UrbanSounds/audio/fold1/101415-3-0-2.wav'
# Turn the wav file into a string tensor
input_data = tf.io.read_file(path_to_wav_file)
# Convert the string tensor to a float32 tensor
audio, sampling_rate = tf.audio.decode_wav(input_data)
This is the error I get at the last step:
2019-10-08 20:56:09.124254: W tensorflow/core/framework/op_kernel.cc:1546] OP_REQUIRES failed at decode_wav_op.cc:55 : Invalid argument: Header mismatch: Expected fmt but found junk
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/ops/gen_audio_ops.py", line 216, in decode_wav
_six.raise_from(_core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Header mismatch: Expected fmt but found junk [Op:DecodeWav]
And here is the beginning of that string tensor. I'm no expert on wav files, but I think the part after "fmt" is where the actual audio data starts. Before that I think it's all metadata about the file.
data.numpy()[:70]
b'RIFFhb\x05\x00WAVEjunk\x1c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00fmt \x10\x00\x00\x00\x01\x00\x01\x00D\xac\x00\x00\x88X\x01\x00\x02\x00'
Upvotes: 10
Views: 3678
Reputation: 430
It seems like your error has to do with TensorFlow expecting the fmt part as the beginning.
The code of TensorFlow for the processing can be found here: https://github.com/tensorflow/tensorflow/blob/c9cd1784bf287543d89593ca1432170cdbf694de/tensorflow/core/lib/wav/wav_io.cc#L225
There's also an open issue, awaiting response from TensorFlow's team which roughly covers the same error you've provided. https://github.com/tensorflow/tensorflow/issues/32382
Other libraries just skip the Junk part, so it works with them.
Upvotes: 8
Reputation: 1058
It seems that your code fails for dual channel audio file. The code works for mono channel wav file. In your case you can try using scipy.
from scipy.io import wavfile as wav
sampling_rate, data = wav.read('101415-3-0-2.wav')
Upvotes: 5