How utterance length affect neural network in speaker recognition?

I'm learning neural networks and trying to create speaker recognition system with tensorflow. I wanted to know how utterance length affect neural network. For example I have 1000 different sound recordings with the same lengths and 1000 different sound recordings with different lenghts. So how theoretically will work neural network with these kind of datas? Will neural network with database of same length recordings will do better or worse? Why?

Upvotes: 5

Answers (2)

Dmytro Prylipko

Reputation: 5064

I assume your question can be reformulated as How a neural network can process audio of different length?

The trick is that the signal of an arbitrary size is converted into a sequence of fixed-size feature vectors. See my answers here and here.

Upvotes: 2

Lukasz Tracewski

Reputation: 11387

It depends on the type of neural network. When design such, you usually specify the number of input neurons, sou can't feed it with data of arbitrary length. In case of longer sequences you have to either crop your data or use a sliding window.

However, some neural networks allow you to process arbitrary sequence of inputs, like e.g. Recurrent Neural Network. The latter seem like a very good candidate for your problem. Here is a good article that describes implementation of particular type of RNN, called Long Short-Term Memory that work nicely with speech recognition.

Upvotes: 1

How utterance length affect neural network in speaker recognition?

Answers (2)

Related Questions