Reputation: 465
I would like the fine tune the embeddings produced by googles universal sentence encoder large 3(https://tfhub.dev/google/universal-sentence-encoder-large/3) to my own corpus. Any suggestions on how to do this would be greatly appreciated. My current idea is to feed sentence pairs from my corpus to the encoder and then use an extra layer to classify if they are the same semantically. My trouble is that I am not sure how to set this up as this requires setting up two USE models that share weights, I believe it is called a siamese network. Any help on how to do this would be greatly appreciated
def train_and_evaluate_with_module(hub_module, train_module=False):
embedded_text_feature_column1 = hub.text_embedding_column(
key="sentence1", module_spec=hub_module, trainable=train_module)
embedded_text_feature_column2 = hub.text_embedding_column(
key="sentence2", module_spec=hub_module, trainable=train_module)
estimator = tf.estimator.DNNClassifier(
hidden_units=[500, 100],
feature_columns=[embedded_text_feature_column1,embedded_text_feature_column2],
n_classes=2,
optimizer=tf.train.AdagradOptimizer(learning_rate=0.003))
estimator.train(input_fn=train_input_fn, steps=1000)
train_eval_result = estimator.evaluate(input_fn=predict_train_input_fn)
test_eval_result = estimator.evaluate(input_fn=predict_test_input_fn)
training_set_accuracy = train_eval_result["accuracy"]
test_set_accuracy = test_eval_result["accuracy"]
return {
"Training accuracy": training_set_accuracy,
"Test accuracy": test_set_accuracy
}
Upvotes: 2
Views: 3273
Reputation: 1238
See https://github.com/tensorflow/hub/issues/134: initialize one hub.Module(..., trainable=True)
object and call it twice.
Upvotes: 3