Reputation: 103
I am trying to fit a simple histogram model with custom weights and no input. It should fit a histogram for the data generated by:
train_data = [max(0,int(np.round(np.random.randn()*2+5))) for i in range(1000)]
The model is defined by
d = 15
class hist_model(tf.keras.Model):
def __init__(self):
super(hist_model,self).__init__()
self._theta = self.add_weight(shape=[1,d],initializer='zero',trainable=True)
def call(self,x):
return self._theta
The problem I have is that training using model.fit
doesn't work: The model weights don't change at all during training. I tried:
model = hist_model()
model.compile(optimizer = tf.keras.optimizers.SGD(learning_rate=1e-2),
loss="sparse_categorical_crossentropy")
history = model.fit(train_data,train_data,verbose=2,batch_size=1,epochs=10)
model.summary()
Which returns:
Epoch 1/3
1000/1000 - 1s - loss: 2.7080
Epoch 2/3
1000/1000 - 1s - loss: 2.7080
Epoch 3/3
1000/1000 - 1s - loss: 2.7080
Model: "hist_model_17"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
Total params: 15
Trainable params: 15
Non-trainable params: 0
________________________
I tried writing a custom training loop for the same model, it worked well. This is the code for the custom training:
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
for epoch in range(3):
running_loss = 0
for data in train_data:
with tf.GradientTape() as tape:
loss_value = loss_fn(data,model(data))
running_loss += loss_value.numpy()
grad = tape.gradient(loss_value,model.trainable_weights)
optimizer.apply_gradients(zip(grad, model.trainable_weights))
print(f'Epoch {epoch} loss: {loss_value}')
I still don't understand why the fit method doesn't work. What am I missing? Thanks!
Upvotes: 0
Views: 165
Reputation: 26698
The difference between the two methods is probably the loss function. Try running:
model.compile(optimizer = tf.keras.optimizers.SGD(learning_rate=1e-2),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))
since the from_logits
parameter is set to False
by default. Meaning it is expected that the output of your model already encodes a probability distribution. Notice the loss differences now with from_logits=True
:
import numpy as np
import tensorflow as tf
d = 15
class hist_model(tf.keras.Model):
def __init__(self):
super(hist_model,self).__init__()
self._theta = self.add_weight(shape=[1,d],initializer='zero',trainable=True)
def call(self,x):
return self._theta
train_data = [max(0,int(np.round(np.random.randn()*2+5))) for i in range(15)]
model = hist_model()
model.compile(optimizer = tf.keras.optimizers.SGD(learning_rate=1e-2),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))
history = model.fit(train_data, train_data,verbose=2,batch_size=1,epochs=10)
Epoch 1/10
15/15 - 0s - loss: 2.7021 - 247ms/epoch - 16ms/step
Epoch 2/10
15/15 - 0s - loss: 2.6812 - 14ms/epoch - 915us/step
Epoch 3/10
15/15 - 0s - loss: 2.6607 - 15ms/epoch - 1ms/step
Epoch 4/10
15/15 - 0s - loss: 2.6406 - 14ms/epoch - 955us/step
Epoch 5/10
15/15 - 0s - loss: 2.6209 - 19ms/epoch - 1ms/step
Epoch 6/10
15/15 - 0s - loss: 2.6017 - 18ms/epoch - 1ms/step
Epoch 7/10
15/15 - 0s - loss: 2.5829 - 15ms/epoch - 999us/step
Epoch 8/10
15/15 - 0s - loss: 2.5645 - 15ms/epoch - 1ms/step
Epoch 9/10
15/15 - 0s - loss: 2.5464 - 27ms/epoch - 2ms/step
Epoch 10/10
15/15 - 0s - loss: 2.5288 - 20ms/epoch - 1ms/step
I think the reduction method used might also have an influence. Check the docs for more details.
Upvotes: 1