Valamas
Valamas

Reputation: 24759

Compare percentage of words or characters

I have a of list of phrases. Each phrase can be a single word or a list of words.

I would like to compare a phrase with each of its sibling phrases and rank those which match the most. Character match or word match come to mind. However the list is quite dirty with commas and hyphens, non-closed brackets etc.

The ranking does not have to be terribly accurate. It is needed as a helper for content editors.

Example list:

Hello sir, how are you?

Top ranking siblings to this phase in this list

Hello madam, how are you?
How are you today?
Today, are you well?

Is there an existing function out there to help with this?

Upvotes: 0

Views: 378

Answers (1)

devuxer
devuxer

Reputation: 42384

I did something very similar recently. Here's an adapted version of my method:

public IEnumerable<string> GetRankedPhrases(IEnumerable<string> phrases, string testPhrase)
{
    return phrases
        .Select(p => new { Phrase = p, Intersection = p.Intersect(testPhrase) })
        .OrderByDescending(pi => pi.Intersection.Count())
        .Select(pi => pi.Phrase);
}

Make sure you have using System.Linq at the top of your code file.

This compares each phrase in phrases with the test phrase. Those that have the most characters in common will float to the top of the list.

Upvotes: 1

Related Questions