Matching phrase using TF-IDF and cosine similarity

Question

I have a dataframe that looks like this:

question                                answer
Why did the chicken cross the road?     to get to the other side
Who are you?                            a chatbot
Hello, how are you?                     Hi
.
.
.

What I'd like to do is use TF-IDF to train on this dataset. When the user enters a phrase, the question that matches the phrase the most will be chosen using cosine similarity. I am able to create the TF-IDF values this way for the sentences on the train dataset, but how do I come up with using this to find the cosine similarity score on the new phrase the user inputs?

from sklearn.feature_extraction.text import TfidfVectorizer
v = TfidfVectorizer()
x = v.fit_transform(intent_data["sentence"])

Virtuoz · Accepted Answer

I think you need something like

from sklearn.metrics.pairwise import cosine_similarity
cosine_similarities = cosine_similarity(x, v.transform(['user input'])).flatten()
best_match_index = cosine_similarities.argmax()

Matching phrase using TF-IDF and cosine similarity

Answers (2)

Related Questions