Reputation: 25
To put it simply, for this project, I would like to delete duplicates, keep the latest entries of duplicates and replace these latest entries in the first entries’ row. Please follow the example given below for a better understanding:
I am looking to delete duplicates based on the ID number by keeping the latest entries from Column A to C. Additionally, I want to keep every cell in Column D and Column E from the first entries. This ultimately means that the latest entries will be replaced in the first entries’ Column A, B & C.
Important note: Column D & E will only be filled in the first entry of each ID. All the additional rows with the same ID will always contain empty cells in columns D & E.
Please see tables below for more clarity and which reflects explanations above:
Based on the example given above, this means:
Delete duplicates based on ID from columns A to C and keep latest entries of each ID (From Column A to C: Delete content in row 1, 2, 3, 5 & 6 + keep latest entries of each ID which are rows 4 & 7 in this case)
Keep Columns D & E from the first entries of each ID (note that only the first entries of each ID will be non-empty cells. In this example, there are two IDs, 123 and 458, and only row 1 and 2 from column D & E will be non-empty)
Replace previous entries by the latest ones in the previous entries’ row from column A to C (From Column A to C, replace row 1 & 2 by row 4 & 7 respectively)
In other words: Update Columns A to C without modifying Columns D to E
See below the same Tables with indications:
I have tried two different codes, but both are not giving me the end results I am looking for.
So, the initial code that I had was as follow. It only kept the previous entries and kept columns A to E as they were initially:
Sub Delete_Duplicates()
Sheet5.Range("$A$1:$E$29999").RemoveDuplicates Columns:=Array(1) _
, Header:=xlYes
End Sub
End results is not accurate as it keeps the first entries in Column A to C:
The problem in the code above is that it does not change the name and date to the latest entries (which would respectively be Bob, Week 6 and Peter, Week 4)
The next code I did was to keep the newest entries, but this, unfortunately, deletes my entries in column D to E:
Sub Delete_Duplicates_2()
Dim Rng As Range, Dn As Range, n As Long
Dim Lst As Long, nRng As Range
Set Rng = Sheet5.Range("$A$2:$E$29999")
Lst = Range("A" & Rows.Count).End(xlUp).Row
With CreateObject("scripting.dictionary")
.CompareMode = vbTextCompare
For n = Lst To 1 Step -1
If Not .Exists(Range("A" & n).Value) Then
.Add Range("A" & n).Value, Nothing
Else
If nRng Is Nothing Then
Set nRng = Range("A" & n)
Else
Set nRng = Union(nRng, Range("A" & n))
End If
End If
Next n
If Not nRng Is Nothing Then
nRng.EntireRow.Delete
End With
End Sub
Below is the result I am obtaining from the second code:
The code above works perfectly to replace my first entries with the latest ones, but it deletes everything in column D & E (Comments & Additional com). I was wondering maybe if my code could be modified by only replacing the duplicates in specific columns, instead of deleting the entire row (which is obviously the problem in this code).
I hope the explanations were clear enough for you to help me out on this matter. Please bear in mind that I have thousands of rows, and a tailored solution to the example I gave would not be what I am looking for. I am open to any suggestions and thank you for your help!
Upvotes: 0
Views: 553
Reputation: 60224
This routine uses a dictionary object to remove the duplicates.
To retain the last row of a duplicate, we start at the bottom and work our way up.
If we do have a duplicate, we test to see if there is anything in columns 4 or 5, if there is, we overwrite that in the dictionary (note the array item(s) cannot be changed directly, but we must extract the array, alter it, and put it back.
We then create a results array and write it back to the worksheet.
Judicious selection of wsRes
, wsSrc
and rRes
will allow you to have the results on a separate worksheet, or even overwrite the original data (although I wouldn't advise that for auditing purposes).
Note that you must set a reference as noted in the comments of the code, or use late-binding.
Option Explicit
'Set reference to Microsoft Scripting Runtime or
' use late-binding
Sub deDup()
Dim wsSrc As Worksheet, wsRes As Worksheet, rRes As Range
Dim vSrc As Variant, vRes As Variant
Dim vRow(2 To 5) As Variant, vKey As Variant, vTemp As Variant
Dim I As Long, J As Long
Dim D As Dictionary
Set wsSrc = Worksheets("sheet3")
Set wsRes = Worksheets("Sheet3")
Set rRes = wsRes.Cells(1, 9)
With wsSrc
vSrc = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp)).Resize(columnsize:=5)
End With
Set D = New Dictionary
For I = UBound(vSrc, 1) To 2 Step -1
vKey = vSrc(I, 1)
If Not D.Exists(vKey) Then
For J = 2 To 5
vRow(J) = vSrc(I, J)
Next J
D.Add Key:=vKey, Item:=vRow
Else
If vSrc(I, 4) <> "" Or vSrc(I, 5) <> "" Then
vTemp = D(vKey)
vTemp(4) = vSrc(I, 4)
vTemp(5) = vSrc(I, 5)
D(vKey) = vTemp
End If
End If
Next I
ReDim vRes(0 To D.Count, 1 To 5)
'Headers
For J = 1 To 5
vRes(0, J) = vSrc(1, J)
Next J
'Data
I = 0
For Each vKey In D.Keys
I = I + 1
vRes(I, 1) = vKey
For J = 2 To 5
vRes(I, J) = D(vKey)(J)
Next J
Next vKey
Set rRes = rRes.Resize(rowsize:=D.Count + 1, columnsize:=UBound(vRes, 2))
With rRes
.EntireColumn.Clear
.Value = vRes
With .Rows(1)
.Font.Bold = True
.HorizontalAlignment = xlCenter
End With
.EntireColumn.AutoFit
End With
End Sub
Upvotes: 1
Reputation: 8531
I have my data as follows (Column A ID, Column B name, Column C data)
A B C
1 a Last
1 a
2 b pre
2 b
3 c test
3 c test2
3 c
3 c
If you get the unique ID's and put them in a column, use VBA, or formula for this.
Then you can, in VBA, use evaluate
to get the last value from the data, like so
evaluate("INDEX($C$1:$C$8,MAX(($A$1:$A$8=F1)*($C$1:$C$8<>"""")*ROW($A$1:$A$8)),1)")
Where column F is the unique ID number.
This assumes that the data is chronologically ordered.
Upvotes: 0