Reputation: 241
I read this great blog about a bag of tricks for image classification.
This part i have a hard time to figure out how to implement in tensorflow, or rather, i have no idea how to do it or if it is even possible.
So, start off with Adam: just set a learning rate that’s not absurdly high, commonly defaulted at 0.0001 and you’ll usually get some very good results. Then, once your model starts to saturate with Adam, fine tune with SGD at a smaller learning rate to squeeze in that last bit of accuracy!
Can you change the optimizer without re-compile in some way?
I have ofc tried googling but cant seem to find much information. Anyone know if this is possible in tensorflow and if so how to do it? (or if you have source that have some info about it)
Upvotes: 0
Views: 1912
Reputation: 1508
You can start form training loop from scratch of the tensorflow documentation. Create two train_step functions, the first with an Adam optimizer and the second with an SGD optimizer.
optimizer1 = keras.optimizers.Adam(learning_rate=1e-3)
optimizer2 = keras.optimizers.SGD(learning_rate=1e-3)
@tf.function
def train_step1(x, y):
with tf.GradientTape() as tape:
logits = model(x, training=True)
loss_value = loss_fn(y, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer1.apply_gradients(zip(grads, model.trainable_weights))
train_acc_metric.update_state(y, logits)
return loss_value
@tf.function
def train_step2(x, y):
with tf.GradientTape() as tape:
logits = model(x, training=True)
loss_value = loss_fn(y, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer2.apply_gradients(zip(grads, model.trainable_weights))
train_acc_metric.update_state(y, logits)
return loss_value
Main loop:
epochs = 20
train_step = train_step1
start_time = time.time()
for epoch in range(epochs):
if epoch > epochs//2:
train_step = train_step2
total_train_loss = 0.
# print("\nStart of epoch %d" % (epoch,))
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
loss_value = train_step(x_batch_train, y_batch_train)
total_train_loss += loss_value.numpy()
...
Note that the graph of each train_step function is built separately. In graph mode, you cannot have a single train_step function with the optimizer as a parameter that changes during iterations (Adam and then SGD).
Upvotes: 1