Leo B
Leo B

Reputation: 161

Python NLP: Get synonyms for a word based on my own corpus

I have a large corpus of text (about 3 GB of plain text).

I want to build a search function.

When the user enters a keyword, I want to display a list of other keywords that are closely related.

For this, I don't want to use any generic synonym dictionary. Instead, I want a function to...

  1. see which other words keyword 1 usually "goes with" in my corpus
  2. find what other words these same words are also commonly associated with, other than my keyword 1 (which would be keyword 2, keyword 3, etc.)

Any ideas for approaches, libraries or examples here? I'm also open for suggestions for a better way of doing this.

Upvotes: 2

Views: 1577

Answers (1)

David Dale
David Dale

Reputation: 11424

  1. Train a word2vec or FastText model on your corpus.
  2. For each keyword, find its nearest neighbors in the space of embeddings learned by the model above.

You can use for example Gensim library to do it in Python.

Upvotes: 1

Related Questions