How to determine if two sentences talk about similar topics?

Question

I would like to ask you a question. Is there any algorithm/tool which can allow me to do some association between words? For example: I have the following group of sentences:

(1)
    "My phone is on the table"
    "I cannot find the charger". # no reference on phone
(2) 
    "My phone is on the table"
    "I cannot find the phone's charger".

What I would like to do is to find a connection, probably a semantic connection, which can allow me to say that the first two sentences are talking about a topic (phone) as two terms (phone and charger) are common within it (in general). Same for the second sentence. I should have something that can connect phone to charger, in the first sentence. I was thinking of using Word2vec, but I am not sure if this is something that I can do with it. Do you have any suggestions about algorithms that I can use to determine similarity of topics (i.e. sentence which are formulated in a different way, but having same topic)?

vukojevicf · Accepted Answer

In Python I'm pretty sure you have a Sequence Matcher that you can usee

from difflib import SequenceMatcher

def similar(a, b):
    return SequenceMatcher(None, a, b).ratio()

If you want your own algorithm I would suggest a Levenstains Distance (it calculates how many operations you need to turn one string(sentance) into another. Might be usefull.). I coded it myself in like this for two strings

    edits = [[x for x in range(len(str1) + 1)] for y in range(len(str2)+ 1)]
    for i in range(len(str2) + 1):
        edits[i][0] = i
    for i in range(1, len(str2) + 1):
        for j in range(1,  len(str1) + 1):
            if str2[i-1] == str1[j-1]:
                edits[i][j] = edits[i-1][j-1]
            else:
                edits[i][j] = 1 + min(edits[i-1][j-1], edits[i-1][j],
                                     edits[i][j-1])
    return edits[-1][-1]

[EDIT] For you, you want to compare if the sentances are about the similar topic. I would suggest any of the following algorithms (all are pretty easy)

Jaccary Similarity
K-means and Hierarchical Clustering Dendrogram
Cosine Similarity

How to determine if two sentences talk about similar topics?

Answers (2)

Related Questions