FrenchConnections
FrenchConnections

Reputation: 391

For every cell in a column, search the rest of the column for a duplicate

I have two large databases of names that I compare to each other for a variety of criteria. However, the names are not always spelled the same, so I end up with a lot of duplicate errors. Examples are Joe Smith in one dataset, Joseph Smith in another, Chris Parker in set 1, Christopher Parker in set 2 etc. All of the errors are collected into one column, so I end up with something that looks like this:

Comparing Data set 1 to Data set 2

Column A
Joe Smith in set 1 || not in set 2
Joseph Smith in set 2 || not in set 1
Nicholas Chun no errors
Alex Conrad in set 2 || not in set 1
Alexander Conrad in set 1 || not in set 2

What I think I need to do is take the first five characters of the last name, and count the number of times that it occurs in the column, then highlight all cells for which the counter >=2. The counter should equal 1 if there is no duplicate, as there will always be the one cell with the string.

I'm not sure how to go about programming this in VBA, since I am a beginner. Any ideas, advice or examples would be welcome.

Update: I've been able to do something like this with formulas. I use this formula to extract the first two letters and the second word from the string:

LEFT(A1,2)&" "&MID(MID(MID(SUBSTITUTE(A1," ","^",1),1,256),FIND("^",SUBSTITUTE(A1," ","^",1)),256),2,FIND(" ",MID(MID(SUBSTITUTE(A1," ","^",1),1,256),FIND("^",SUBSTITUTE(A1," ","^",1)),256))-2)

Then I use a countif statement to see how many times that string appears in the new column, and if countif>=2, I flag it as a possible error. However, I'm struggling to translate this into code, without adding any columns to the spreadsheet. I think the right approach is to declare the range as an array, and run the comparison on every cell in the array. Here is what I've got so far:

Function WordExtract(Source As String, Position As Integer)
Dim arr() As String
arr = VBA.Split(Source, " ")
If xcount < 1 Or (Position - 1) > xcount Or Position < 0 Then
WordExtract = ""
Else
WordExtract = arr(Position - 1)
End If
End Function

Sub NameEnhancer()

Dim arr As Variant
Dim LastRowReport As Long
Dim i As Long
Dim j As Long
Dim ws As Worksheet

Set ws = Sheets("Sheet2")

LastRowReport = ws.Cells(Rows.Count, "AA").End(xlUp).Row

arr = ws.Range("AA4:AA" & LastRowReport).Value
For i = 1 To UBound(arr)
arr = LEFT(RC[-1],2)&" "&WordExtract(i, 2)

End Sub

Upvotes: 1

Views: 162

Answers (1)

ashareef
ashareef

Reputation: 1846

This might be a complex problem to solve "correctly" for a total beginner. What you're looking for can be found here.

Finding groups of similar strings in a large set of strings

VLOOKUP in excel also has a fuzzy search of sorts but I'm not sure of the quality of the matching.

Upvotes: 0

Related Questions