tomatofighter
tomatofighter

Reputation: 69

TF model wrong output dimensions

I am trying to make a model that is able to extract human speech from a recording. To do this I have loaded 1500 noisy files (some of these files are the exact same but with different speech to noise ratios (-1,1,3,5,7). I want my model to take in a wav file as a one dimensional array/tensor along the horizontal axis, and output a one dimensional array/tensor that I could then play. currently this is how my data is set up. enter image description here

this is how my model is setup enter image description here

an error I am having is that I am not able to make a prediction and when I am i get an array/tensor with only one element, instead one with 220500. The reason behind 22050 is that it is the length of the background noise that was overlapped into clean speech so every file is this length. enter image description here I have been messing around with layers.Input because while I want my model to take in every row as one "object"/audio clip. I dont know if that is what's happening because the only "successful" prediction is an error

Upvotes: 0

Views: 447

Answers (1)

kaosdev
kaosdev

Reputation: 309

The model you built expect data in the format (batch_size, 1, 220500), as in the input layer you declared an input_shape of (1, 220500).

For the data you are using you should just use an input_shape of (220500,).

Another problem you might encounter, is that you are using a single unit in the last layer. This way the output of the model will be (batch_size, 1), but you need (batch_size, 220500) as an output.

For this last problem I suggest you to use a generative recurrent neural network.

Upvotes: 1

Related Questions