Reputation: 487
All the math I've seen behind propagating data from one neural network layer to the next in calculating z looks like:
z = θTx+b
but keras seems to diverge from that standard. It accepts its input matrices in the form of number of samples for rows and number of features for columns and the get_weights()
command returns matrices with shapes that only satisfy the equation z if the following were true:
z = xθ+b
Given the following example of a network learning an XOR gate with input dimensions 4x2 and output dimensions 4x1:
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
X = np.array([[0,0],
[1,0],
[0,1],
[1,1]])
Y = np.array([0,1,1,0])
model = Sequential()
model.add(Dense(10, input_dim=2, activation='sigmoid'))
model.add(Dense(10, activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])
model.fit(X, Y, epochs=100000, batch_size=4, verbose=0)
print model.get_weights()
print model.predict(X)
The model weights for each layer come out as 2x10, 10x10, and 10x1. The matrix multiplication fails to satisfy the first equation given for z but appears to work for the second. Does keras really handle its neural network computations this way or am I misinterpreting something in the code somewhere? Should my input X dimensions be transposed instead? Any help is appreciated.
Upvotes: 1
Views: 1482
Reputation: 1340
This all goes as expected. Please take a look at the code written here, and search for the Dense class.
To make it a bit easier.. here are two important snippets of the code. This is how the kernel is build
input_dim = input_shape[-1]
self.kernel = self.add_weight(shape=(input_dim, self.units),
initializer=self.kernel_initializer,
name='kernel',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
And this is how the multiplication is done
def call(self, inputs):
output = K.dot(inputs, self.kernel)
If you consider this approach, how your weights, input_dims, etc. are defined, it then all makes perfect sense. If not, please leave a reply :)
Upvotes: 1
Reputation: 859
There is a problem with the way to setup your weights (shape). Take a look at this example, taken from here:
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD
import numpy as np
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([[0],[1],[1],[0]])
model = Sequential()
model.add(Dense(8, input_dim=2))
model.add(Activation('tanh'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
sgd = SGD(lr=0.1)
model.compile(loss='binary_crossentropy', optimizer=sgd)
model.fit(X, y, show_accuracy=True, batch_size=1, nb_epoch=1000)
print(model.predict_proba(X))
"""
[[ 0.0033028 ]
[ 0.99581173]
[ 0.99530098]
[ 0.00564186]]
"""
Upvotes: 0