Reputation: 81
Hi there I'm hoping someone here can help me out.
I've got some pretty untidy data and I am looking to do a VLOOKUP / INDEX MATCH sort of function but having no luck as the data is quite untidy.
So basically I've got a list of names and there websites in one sheet.
Then a second sheet with just a list of names that I want to populate with the websites from the other sheet as to cut down the work load.
The problem is the company names are not entered in the correct format (i.e for an abbreviated company spaces are used instead of .'s). Another example is when there is multiple branches of a company.
So worksheet 1 would be
Company Name: Company
Website: www.company.com
Worksheet 2 would be: Company Name : Company (UK) Company Name : Company (USA)
Where both should have the same website as found in worksheet 1 is there any possible way to do this. I have tried VLOOKUP and INDEX/MATCH and set the condition to TRUE but it does not return good results.
Thanks in advance for any advice.
Upvotes: 0
Views: 2980
Reputation: 1
To expand on the answer by Trum because the link he posted is no longer available, I have translated the JaroWinklerProximity algorithm found in this C# implementation https://stackoverflow.com/a/19165108/8031589 by user leebickmtu to VBA for Excel:
Option Base 1
Function JaroWinklerProximity(String1 As Range, String2 As Range) As Double
Dim mWeightThreshold As Double
mWeightThreshold = 0.7
Dim mNumChars As Integer
mNumChars = 4
Dim aString1 As String
aString1 = LCase(String1.Text)
Dim aString2 As String
aString2 = LCase(String2.Text)
Dim lLen1 As Integer
lLen1 = Len(aString1)
Dim lLen2 As Integer
lLen2 = Len(aString2)
If lLen1 = 0 Then
If lLen2 = 0 Then
JaroWinklerProximity = 1
Exit Function
Else
JaroWinklerProximity = 0
Exit Function
End If
End If
Dim lSearchRange As Integer
lSearchRange = WorksheetFunction.Max(1, WorksheetFunction.Max(lLen1, lLen2) / 2)
ReDim lMatched1(lLen1) As Boolean
ReDim lMatched2(lLen2) As Boolean
Dim lNumCommon As Integer
lNumCommon = 0
Dim i As Integer
For i = 1 To lLen1 Step 1
Dim lStart As Integer
lStart = WorksheetFunction.Max(1, i - lSearchRange)
Dim lEnd As Integer
lEnd = WorksheetFunction.Min(i + lSearchRange, lLen2)
Dim j As Integer
For j = lStart To lEnd - 1 Step 1
If lMatched2(j) Then
GoTo NextIteration1
End If
Dim charAtIndex1 As String
charAtIndex1 = Mid(aString1, i, 1)
Dim charAtIndex2 As String
charAtIndex2 = Mid(aString2, j, 1)
If charAtIndex1 <> charAtIndex2 Then
GoTo NextIteration1
End If
lMatched1(i) = True
lMatched2(j) = True
lNumCommon = lNumCommon + 1
Exit For
NextIteration1:
Next j
Next i
If lNumCommon = 0 Then
JaroWinklerProximity = 0
Exit Function
End If
Dim lNumHalfTransposed As Integer
lNumHalfTransposed = 0
Dim k As Integer
k = 1
For i = 1 To lLen1 Step 1
If Not lMatched1(i) Then
GoTo NextIteration2
End If
Do While Not lMatched2(k)
k = k + 1
Loop
If Mid(aString1, i, 1) <> Mid(aString2, j, 1) Then
lNumHalfTransposed = lNumHalfTransposed + 1
End If
k = k + 1
NextIteration2:
Next
Dim lNumTransposed As Integer
lNumTransposed = lNumHalfTransposed / 2
Dim lNumCommonD As Double
lNumCommonD = lNumCommon
Dim lWeight As Double
lWeight = (lNumCommonD / lLen1 + lNumCommonD / lLen2 + (lNumCommon - lNumTransposed) / lNumCommonD) / 3
If lWeight <= mWeightThreshold Then
JaroWinklerProximity = lWeight
Exit Function
End If
Dim lMax As Integer
lMax = WorksheetFunction.Min(mNumChars, WorksheetFunction.Min(Len(aString1), Len(aString2)))
Dim lPos As Integer
lPos = 1
Do While lPos < lMax And Mid(aString1, lPos, 1) = Mid(aString2, lPos, 1)
lPos = lPos + 1
Loop
If lPos = 1 Then
JaroWinklerProximity = lWeight
Exit Function
End If
JaroWinklerProximity = lWeight + 0.1 * lPos * (1# - lWeight)
End Function
Upvotes: 0
Reputation: 617
Without seeing the data it's impossible to say for sure, but here are some options.
1. You could use MATCH("*Company*",Sheets1!A:A,0)
.
2. If there is always the same pattern, you could create a helper column using LEFT, RIGHT, and MID to extract the company name.
3. VBA offers some support for regular expressions, using Set reg = CreateObject("VBScript.RegExp")
If you know how to use regular expressions, you could write a macro to match based on it.
Upvotes: 1
Reputation: 630
Fuzzy matching is a far from an exact science, particularly when it comes to the built in Excel functions.
If I was to recommend the safest bet - I would say create a de-duped list of the values in sheet 2 and create a matching lookup value for sheet one. This would certainly be the most accurate (but time consuming and depending on how many times you will use these values)
Alternatively, there are 'fuzzy matches' from various sources that can be used. One of the famous ones is the 'Jaro Winkler Distance' : a version of which can be found at this link: http://garonfolo.dk/herbert/2013/07/excel-vba-jaro-winkler-distance-fuzzy-matching/
Upvotes: 1