Feed Forward - Neural Networks Keras

Question

for my input in the feed forward neural network that I have implemented in Keras, I just wanted to check that my understanding is correct.

[[ 25.26000023  26.37000084  24.67000008  23.30999947]
[ 26.37000084  24.67000008  23.30999947  21.36000061]
[ 24.67000008  23.30999947  21.36000061  19.77000046]...]

So in the data above it is a time window of 4 inputs in an array. My input layer is

model.add(Dense(4, input_dim=4, activation='sigmoid')) 

model.fit(trainX, trainY, nb_epoch=10000,verbose=2,batch_size=4)

and batch_size is 4, in theory when I call the fit function will the function go over all these inputs in each nb_epoch? and does the batch_size need to be 4 in order for this time window to work?

Thanks John

lejlot · Accepted Answer

and batch_size is 4, in theory when I call the fit function will the function go over all these inputs in each nb_epoch?

Yes, each epoch is iteration over all training samples

and does the batch_size need to be 4 in order for this time window to work?

No, these are completely unrelated things. Batch is simply a subset of your training data which is used to compute approximation of the true gradient of the cost function. Bigger the batch - closer you get to the true gradient (and original Gradient Descent), but training gets slower. Closer to 1 you get - it becomes more and more stochastic, noisy approxmation (and closer to Stochastic Gradient Descent). The fact that you matched batch_size and data dimensionality is just an odd-coincidence, and has no meaning.

Let me put this in more generall setting, what you do in gradient descent with additive loss function (which neural nets usually use) is going against the gradient which is

grad_theta 1/N SUM_i=1^N loss(x_i, pred(x_i), y_i|theta) =  
 = 1/N SUM_i=1^N grad_theta loss(x_i, pred(x_i), y_i|theta)

where loss is some loss function over your pred (prediction) as compared to y_i.

And in batch based scenatio (the rough idea) is that you do not need to go over all examples, but instead some strict subset, like batch = {(x_1, y_1), (x_5, y_5), (x_89, y_89) ... } and use approximation of the gradient of form

1/|batch| SUM_(x_i, y_i) in batch: grad_theta loss(x_i, pred(x_i), y_i|theta)

As you can see this is not related in any sense to the space where x_i live, thus there is no connection with dimensionality of your data.

Feed Forward - Neural Networks Keras

Answers (2)

Related Questions