How to fine tune universal sentence encoder 3 embeddings to own corpus

Question

I would like the fine tune the embeddings produced by googles universal sentence encoder large 3(https://tfhub.dev/google/universal-sentence-encoder-large/3) to my own corpus. Any suggestions on how to do this would be greatly appreciated. My current idea is to feed sentence pairs from my corpus to the encoder and then use an extra layer to classify if they are the same semantically. My trouble is that I am not sure how to set this up as this requires setting up two USE models that share weights, I believe it is called a siamese network. Any help on how to do this would be greatly appreciated

def train_and_evaluate_with_module(hub_module, train_module=False):
    embedded_text_feature_column1 = hub.text_embedding_column(
      key="sentence1", module_spec=hub_module, trainable=train_module)

    embedded_text_feature_column2 = hub.text_embedding_column(
      key="sentence2", module_spec=hub_module, trainable=train_module)


    estimator = tf.estimator.DNNClassifier(
      hidden_units=[500, 100],
      feature_columns=[embedded_text_feature_column1,embedded_text_feature_column2],
      n_classes=2,
      optimizer=tf.train.AdagradOptimizer(learning_rate=0.003))

    estimator.train(input_fn=train_input_fn, steps=1000)

    train_eval_result = estimator.evaluate(input_fn=predict_train_input_fn)
    test_eval_result = estimator.evaluate(input_fn=predict_test_input_fn)

    training_set_accuracy = train_eval_result["accuracy"]
    test_set_accuracy = test_eval_result["accuracy"]

    return {
      "Training accuracy": training_set_accuracy,
      "Test accuracy": test_set_accuracy
    }

How to fine tune universal sentence encoder 3 embeddings to own corpus

Answers (1)

Related Questions