TF model wrong output dimensions

Question

I am trying to make a model that is able to extract human speech from a recording. To do this I have loaded 1500 noisy files (some of these files are the exact same but with different speech to noise ratios (-1,1,3,5,7). I want my model to take in a wav file as a one dimensional array/tensor along the horizontal axis, and output a one dimensional array/tensor that I could then play. currently this is how my data is set up.

this is how my model is setup

an error I am having is that I am not able to make a prediction and when I am i get an array/tensor with only one element, instead one with 220500. The reason behind 22050 is that it is the length of the background noise that was overlapped into clean speech so every file is this length. I have been messing around with layers.Input because while I want my model to take in every row as one "object"/audio clip. I dont know if that is what's happening because the only "successful" prediction is an error

TF model wrong output dimensions

Answers (1)

Related Questions