Laure
Laure

Reputation: 19

how to crawl semantically similar sentences

I want to create a corpus for a machine learning task. I have a small textual dataset and want to crawl similar sentences from web. I used sentence_transformers package with Bert pertained model, doc2vec and spacy similarity to measure similarity. I set the threshold to 85%, but the sentences with the similarity score higher than the threshold weren't really relevant. how can I crawl similar sentences from web in python?

Upvotes: 1

Views: 160

Answers (1)

DaveR
DaveR

Reputation: 2358

I think you should train a big model on a big corpus and then use that model to generate random sentences. The gensim library has several corpora link that you can use to find similar sentences or to train a model that generates similar sentences , here is how to do it.

Upvotes: 1

Related Questions