PeakGen
PeakGen

Reputation: 22995

Calculating the similarity between 2 sentences

I would like to calculate the similarity between 2 sentences and I need the percentage value which says "how good" they match with each other. Sentences like,

1. The red fox is moving on the hill.
2. The black fox is moving in the bill.

I was considering about Levenshtein distance but I am not sure about this because it says it is for finding similarity between "2 words". So can this Levenshtein distancehelp me or what other method can help me? I will be using JavaScript.

Upvotes: 1

Views: 3362

Answers (5)

Best Codes
Best Codes

Reputation: 11

Here is a code for an HTML page that does what you want:

<!DOCTYPE html>
<html>
<head>
    <title>Phrase Similarity Calculator</title>
    <script>
        function calculateSimilarity() {
            var phrase1 = document.getElementById("phrase1").value;
            var phrase2 = document.getElementById("phrase2").value;
            
            // Calculate similarity percentage
            var similarity = calculateLevenshteinDistance(phrase1, phrase2);
            var similarityPercentage = similarityToPercentage(similarity);
            
            // Display the result
            document.getElementById("result").innerHTML = "The similarity percentage is: " + similarityPercentage + "%";
        }
        
        function calculateLevenshteinDistance(phrase1, phrase2) {
            // Levenshtein Distance calculation
            var distance = [];
            for (var i = 0; i <= phrase1.length; i++) {
                distance[i] = [];
                distance[i][0] = i;
            }
            for (var j = 0; j <= phrase2.length; j++) {
                distance[0][j] = j;
            }
            for (var i = 1; i <= phrase1.length; i++) {
                for (var j = 1; j <= phrase2.length; j++) {
                    var cost = (phrase1.charAt(i - 1) === phrase2.charAt(j - 1)) ? 0 : 1;
                    distance[i][j] = Math.min(
                        distance[i - 1][j] + 1,         // Deletion
                        distance[i][j - 1] + 1,         // Insertion
                        distance[i - 1][j - 1] + cost   // Substitution
                    );
                }
            }
            return distance[phrase1.length][phrase2.length];
        }
        
        function similarityToPercentage(similarity) {
            var maxLength = Math.max(document.getElementById("phrase1").value.length, document.getElementById("phrase2").value.length);
            var percentage = ((maxLength - similarity) / maxLength) * 100;
            return percentage.toFixed(2);
        }
    </script>
</head>
<body>
    <h1>Phrase Similarity Calculator</h1>
    <label for="phrase1">Phrase 1:</label>
    <input type="text" id="phrase1" placeholder="Enter phrase 1"><br>
    <label for="phrase2">Phrase 2:</label>
    <input type="text" id="phrase2" placeholder="Enter phrase 2"><br>
    <button onclick="calculateSimilarity()">Calculate Similarity</button>
    <p id="result"></p>
</body>
</html>

You can test it here ( until the link expires ):

https://onecompiler.com/html/3zmakmwy5

Upvotes: 0

PeterNL
PeterNL

Reputation: 670

A common Method to compute the similarity of two sentences is to cosine similiarity. Don't know if there an implemenatation in JavaScript exists. The cosine similiarity looks on words and not of single letters. The web is full of explenations for example here.

Upvotes: 0

Aman
Aman

Reputation: 2276

Try this solution for JS string diff

Upvotes: 3

Emanuele Bezzi
Emanuele Bezzi

Reputation: 1728

Use Jaccard index. You can find implementations in any language, including JavaScript (here is one, didn't test it personally though).

Upvotes: 1

Frank Visaggio
Frank Visaggio

Reputation: 3718

this is what i would do depending on how important this is. if this is medium to low priority here is a simple algo.

  1. scan all sentences and see how often a word occurs.
  2. filter out the most common words like the ones in 30% of sentences , i.e. don't count these. so at the as would hopefully not be counted.
  3. then do your bag of words comparison.

But the context in why you want to do this is really important. i.e. the example you gave us could be for students learning english etc. i.e. theres different algorithms i would use if i was trying to see if crowd sourced users are describing the same paragraph vs if article topics are similar enough for a suggested reading section.

Upvotes: 0

Related Questions