PDPDPDPD
PDPDPDPD

Reputation: 465

How to fine tune universal sentence encoder 3 embeddings to own corpus

I would like the fine tune the embeddings produced by googles universal sentence encoder large 3(https://tfhub.dev/google/universal-sentence-encoder-large/3) to my own corpus. Any suggestions on how to do this would be greatly appreciated. My current idea is to feed sentence pairs from my corpus to the encoder and then use an extra layer to classify if they are the same semantically. My trouble is that I am not sure how to set this up as this requires setting up two USE models that share weights, I believe it is called a siamese network. Any help on how to do this would be greatly appreciated

def train_and_evaluate_with_module(hub_module, train_module=False):
    embedded_text_feature_column1 = hub.text_embedding_column(
      key="sentence1", module_spec=hub_module, trainable=train_module)

    embedded_text_feature_column2 = hub.text_embedding_column(
      key="sentence2", module_spec=hub_module, trainable=train_module)


    estimator = tf.estimator.DNNClassifier(
      hidden_units=[500, 100],
      feature_columns=[embedded_text_feature_column1,embedded_text_feature_column2],
      n_classes=2,
      optimizer=tf.train.AdagradOptimizer(learning_rate=0.003))

    estimator.train(input_fn=train_input_fn, steps=1000)

    train_eval_result = estimator.evaluate(input_fn=predict_train_input_fn)
    test_eval_result = estimator.evaluate(input_fn=predict_test_input_fn)

    training_set_accuracy = train_eval_result["accuracy"]
    test_set_accuracy = test_eval_result["accuracy"]

    return {
      "Training accuracy": training_set_accuracy,
      "Test accuracy": test_set_accuracy
    }

Upvotes: 2

Views: 3273

Answers (1)

arnoegw
arnoegw

Reputation: 1238

See https://github.com/tensorflow/hub/issues/134: initialize one hub.Module(..., trainable=True) object and call it twice.

Upvotes: 3

Related Questions