envi z
envi z

Reputation: 817

How is the semantic similarity score calculated in STS Benchmark dataset?

This is the GitHub repo : https://github.com/brmson/dataset-sts

The STS Benchmark dataset contains about 4000 pairs of similar and dissimilar sentences along with their semantic similarity scores.

Task that I'm trying to do: I have another custom dataset which also has pairs of similar and dissimilar sentences. ( with just 200 pairs)

I want to combine these two datasets (STS & my custom dataset) and use that for fine tuning a Bert model. (Bert sentence transformer: https://github.com/UKPLab/sentence-transformers)

But the model needs the semantic similarity score of all the pairs of sentences. How do I compute that score for the sentences that I have in my custom dataset?

It has to be computed in the same way as how it was computed for the pairs of sentences in the STS Benchmark dataset.

This thread is very similar but it didn't quite answer the question that I am looking for : Bert fine-tuned for semantic similarity

Upvotes: 2

Views: 2519

Answers (1)

Jindřich
Jindřich

Reputation: 11213

The STS datasets are manually annotated, i.e., there we humans in the loop that said how similar the sentences are. In the SemEval datasets, there is a pretty complex annotation procedure where each sentence pair is annotated by multiple people to ensure some consensus. This is also how you can get your custom dataset.

The STS score is the correlation coefficient between the similarity score as judged by human annotators and the similarity estimated by your model.

Upvotes: 2

Related Questions