S31
S31

Reputation: 934

String vs column - percent match

Trying to figure out a way to calculate the minimum percentage match when comparing a string to a column.

Example:

Column A        Column B
Key             Keylime
Key Chain       Status
                Serious
                Extreme
                Key

Where

Column A        Column B     Column C                 Column D
Key             Temp         100%                     Key
Key Chain       Status       66.7%                    Key Ch
Ten             Key Ch       100%                     Tenure
                Extreme       
                Key
                Tenure 

To expand on this:

To expand on Column C - when looking at Key Chain - the highest match to any word it has in column B is for Key Ch where 6 out of the 9 characters (including space) of Key Chain match to give a percentage match of (6/9) = 66.7%

Upvotes: 0

Views: 344

Answers (1)

iamanigeeit
iamanigeeit

Reputation: 834

This should work (i haven't tested it, currently on Linux). Call getStrMatch for each string.

Type StrMatch
    Percent As Double
    Word As String
End Type

Function getStrMatch(s As String, RefRange As Range) As StrMatch
    Dim i As Long, ref_str As String
    Dim BestMatch As StrMatch: BestMatch.Percent = -1
    Dim match_pc As Double
    With RefRange
        For i = 1 to .Cells.Count
            ref_str = .Cells(i).Value2
            match_pc = getMatchPc(s, ref_str)
            If match_pc > BestMatch.Percent Then
                BestMatch.Percent = match_pc
                BestMatch.Word = ref_str
            End If
        Next i
    End With
    getStrMatch = BestMatch
End Function

Function getMatchPc(s As String, ref_str As String) As Double
    Dim s_len As Long: s_len = Len(s)
    Dim ref_len As Long: ref_len = Len(ref_str) 
    Dim longer_len as Long
    If s_len > ref_len Then longer_len = s_len Else longer_len = ref_len
    Dim m As Long: m = 1
    While m <= longer_len
        If Mid(s, m, 1) <> Mid(ref_str, m, 1) Then Exit While
        m = m + 1
    Wend
    getMatchPc = (m - 1.0) / longer_len
End Function

Note that you have to put this in a module or else declare Private Type and Private Function.

Also, if you're matching a lot of strings you probably should create a trie instead, as this is only doing naive string compares and each getStrMatch costs O(mn) where m is the size of the RefRange and n is the average ref_str length.

Upvotes: 1

Related Questions