Willy
Willy

Reputation: 313

Matching two strings together using NLTK?

So I am trying to write a program that will take in 2 strings, for example:

"I like pizza better cold"

And

"I really enjoy pizza when it is chilled"

And figure out if these two things match each other in comparison to something like:

"I like pizza better cold"

And

"Pizza really sucks."

Where the above would not be a match.

I have come to the NLTK language available for Python to do this. I am just wondering if there is anyone out there who has worked on something like this before and has any advice? Is NLTK the way to go? Any functions or specs I should use?

I am thinking about splitting the strings into tokens and then picking out the adjectives and nouns as the main method of tagging then possibly using a sentiment analysis algorithm to determine if it is positive or not then match the strings based on this...

This is just a small side project I am working on for fun, so anything would be beneficial here :)

Cheers, Will

Upvotes: 3

Views: 9760

Answers (1)

Rohan Amrute
Rohan Amrute

Reputation: 774

According to your question you want to compare two sentences and then probably find out how much percentage they match.

For finding the similarity between sentences you can use Jaccard Similarity or Cosine Similarity.

Refer this for Cosine Similarity How to calculate cosine similarity given 2 sentence strings? - Python

If the cosine similarity is less then the sentences are nor similar but if it is closer to 1 then the sentences are similar

NLTK can be used to find the synonyms of the words in the sentence so that you can get semantics from the sentence.

For finding synonyms you could use the following code:

from nltk.corpus import wordnet as wn
wn.synsets(your word)

Upvotes: 3

Related Questions