Uru
Uru

Reputation: 25

Replacing first duplicate with last duplicate in first duplicate’s row – Excel VBA

To put it simply, for this project, I would like to delete duplicates, keep the latest entries of duplicates and replace these latest entries in the first entries’ row. Please follow the example given below for a better understanding:

I am looking to delete duplicates based on the ID number by keeping the latest entries from Column A to C. Additionally, I want to keep every cell in Column D and Column E from the first entries. This ultimately means that the latest entries will be replaced in the first entries’ Column A, B & C.

Important note: Column D & E will only be filled in the first entry of each ID. All the additional rows with the same ID will always contain empty cells in columns D & E.

Please see tables below for more clarity and which reflects explanations above: Example with the first table: database, second table: result after macro

Based on the example given above, this means:

In other words: Update Columns A to C without modifying Columns D to E

See below the same Tables with indications: Two previous tables with indications

I have tried two different codes, but both are not giving me the end results I am looking for.

So, the initial code that I had was as follow. It only kept the previous entries and kept columns A to E as they were initially:

Sub Delete_Duplicates()
    Sheet5.Range("$A$1:$E$29999").RemoveDuplicates Columns:=Array(1) _
    , Header:=xlYes
End Sub  

End results is not accurate as it keeps the first entries in Column A to C: Table results after first macro test

The problem in the code above is that it does not change the name and date to the latest entries (which would respectively be Bob, Week 6 and Peter, Week 4)

The next code I did was to keep the newest entries, but this, unfortunately, deletes my entries in column D to E:

Sub Delete_Duplicates_2()
Dim Rng As Range, Dn As Range, n As Long
Dim Lst As Long, nRng As Range
Set Rng = Sheet5.Range("$A$2:$E$29999")
Lst = Range("A" & Rows.Count).End(xlUp).Row
    With CreateObject("scripting.dictionary")
    .CompareMode = vbTextCompare
    For n = Lst To 1 Step -1
    If Not .Exists(Range("A" & n).Value) Then
    .Add Range("A" & n).Value, Nothing
    Else
        If nRng Is Nothing Then
            Set nRng = Range("A" & n)
        Else
            Set nRng = Union(nRng, Range("A" & n))
        End If
    End If
    Next n
    If Not nRng Is Nothing Then 
    nRng.EntireRow.Delete
    End With
End Sub

Below is the result I am obtaining from the second code: Table results after second macro test

The code above works perfectly to replace my first entries with the latest ones, but it deletes everything in column D & E (Comments & Additional com). I was wondering maybe if my code could be modified by only replacing the duplicates in specific columns, instead of deleting the entire row (which is obviously the problem in this code).

I hope the explanations were clear enough for you to help me out on this matter. Please bear in mind that I have thousands of rows, and a tailored solution to the example I gave would not be what I am looking for. I am open to any suggestions and thank you for your help!

Upvotes: 0

Views: 553

Answers (2)

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60224

This routine uses a dictionary object to remove the duplicates.

To retain the last row of a duplicate, we start at the bottom and work our way up.

If we do have a duplicate, we test to see if there is anything in columns 4 or 5, if there is, we overwrite that in the dictionary (note the array item(s) cannot be changed directly, but we must extract the array, alter it, and put it back.

We then create a results array and write it back to the worksheet.

Judicious selection of wsRes, wsSrc and rRes will allow you to have the results on a separate worksheet, or even overwrite the original data (although I wouldn't advise that for auditing purposes).

Note that you must set a reference as noted in the comments of the code, or use late-binding.

Option Explicit
'Set reference to Microsoft Scripting Runtime or
'    use late-binding
Sub deDup()
    Dim wsSrc As Worksheet, wsRes As Worksheet, rRes As Range
    Dim vSrc As Variant, vRes As Variant
    Dim vRow(2 To 5) As Variant, vKey As Variant, vTemp As Variant
    Dim I  As Long, J As Long
    Dim D As Dictionary

 Set wsSrc = Worksheets("sheet3")
 Set wsRes = Worksheets("Sheet3")
    Set rRes = wsRes.Cells(1, 9)

With wsSrc
    vSrc = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp)).Resize(columnsize:=5)
End With

Set D = New Dictionary
For I = UBound(vSrc, 1) To 2 Step -1
    vKey = vSrc(I, 1)
    If Not D.Exists(vKey) Then
        For J = 2 To 5
            vRow(J) = vSrc(I, J)
        Next J
        D.Add Key:=vKey, Item:=vRow
    Else
        If vSrc(I, 4) <> "" Or vSrc(I, 5) <> "" Then
            vTemp = D(vKey)
            vTemp(4) = vSrc(I, 4)
            vTemp(5) = vSrc(I, 5)
            D(vKey) = vTemp
        End If
    End If
Next I

ReDim vRes(0 To D.Count, 1 To 5)

    'Headers
    For J = 1 To 5
        vRes(0, J) = vSrc(1, J)
    Next J

    'Data
    I = 0
    For Each vKey In D.Keys
        I = I + 1
        vRes(I, 1) = vKey
        For J = 2 To 5
            vRes(I, J) = D(vKey)(J)
        Next J
    Next vKey

Set rRes = rRes.Resize(rowsize:=D.Count + 1, columnsize:=UBound(vRes, 2))
With rRes
    .EntireColumn.Clear
    .Value = vRes
    With .Rows(1)
        .Font.Bold = True
        .HorizontalAlignment = xlCenter
    End With
    .EntireColumn.AutoFit
End With
End Sub

enter image description here

Upvotes: 1

Nathan_Sav
Nathan_Sav

Reputation: 8531

I have my data as follows (Column A ID, Column B name, Column C data)

A       B       C

1   a   Last

1   a   

2   b   pre

2   b   

3   c   test

3   c   test2

3   c   

3   c

If you get the unique ID's and put them in a column, use VBA, or formula for this.

Then you can, in VBA, use evaluate to get the last value from the data, like so

evaluate("INDEX($C$1:$C$8,MAX(($A$1:$A$8=F1)*($C$1:$C$8<>"""")*ROW($A$1:$A$8)),1)")

Where column F is the unique ID number.

This assumes that the data is chronologically ordered.

Upvotes: 0

Related Questions