Reputation: 721
I am trying to remove duplicate ID's in a sheet. For example here is a few rows of data
ID | Department | Sales | Update Date
1 | Sales | 100 |
2 | Marketing | 100 |
2 | Marketing | 200 | 30/06/2015
2 | Marketing | 300 | 05/07/2015
I want to remove the duplicate ID's but base this on the update date column. So I want to only have the following remaining:
ID | Department | Sales | Update Date
1 | Sales | 100 |
2 | Marketing | 300 | 05/07/2015
So it checks for the latest update row of that ID and removes the others.
Any advice on using VBA or a macro to do this would be great as it will form part of a automated script.
Upvotes: 2
Views: 141
Reputation: 12737
One way to achieve what you want to do is to read all rows and iterate through each duplicate row and find what to keep, what to delete based on finding the highest update_date.
I have successfully wrote a macro to do that. Here is my code:
First: Create a blank Module in the VBA Editor and paste the following code:
Public Type Row
id As String
updated As Date
row_number As Integer 'to know which rows to delete later
is_duplicate As Boolean 'to mark if current row is duplicate
to_keep As Boolean 'to decide whether to keep or to delete
verified As Boolean 'needed to avoid evaluating all rows with the same ID
End Type
Sub RemoveDuplicates()
Range("a2").Select 'go to first row
Dim cnt As Integer 'keep record of how many rows
cnt = 0 'begin with an empty array
Dim rows() As Row 'declared without the count
'== step 1: read all data and store in array ===============
Do While ActiveCell.Value <> ""
cnt = cnt + 1
ReDim Preserve rows(cnt) 'expand the size of the array by ONE
rows(cnt - 1).row_number = ActiveCell.Row 'keep record of current row address
If ActiveCell.Offset(1, 0).Value = ActiveCell.Value Or _
ActiveCell.Offset(-1, 0).Value = ActiveCell.Value Then
'if the cell above/below has the samve ID as the current cell, then it's duplicates
rows(cnt - 1).is_duplicate = True
Else
rows(cnt - 1).is_duplicate = False
End If
rows(cnt - 1).id = ActiveCell.Value 'store the id
rows(cnt - 1).updated = ActiveCell.Offset(0, 3).Value 'store the date
ActiveCell.Offset(1, 0).Select 'move to the next row below
Loop
'=== step 2: iterating throw the array and deciding what to keep, what to delete =========
For i = 0 To cnt - 1
If rows(i).is_duplicate And Not rows(i).verified Then
'the current ID is duplicated, and all of the other rows with the same ID has not been verified
find_to_keep rows, rows(i).id, cnt 'helper Sub to analyze each row
End If
Next
'==== step 3: iterating throw the array to delete ones marked to delete ==========
For i = cnt - 1 To 0 Step -1 'we have to reverse the order because deleted rows will contain data from other valid rows
If rows(i).is_duplicate And Not rows(i).to_keep Then
'if the current row is duplicate and is not marked (to keep) then it must be deleted
Dim r As Integer
r = rows(i).row_number 'get the rows number (address) of the row
Range(r & ":" & r).EntireRow.Delete shift:=xlShiftUp 'delete the row and shift the other rows below UP
End If
Next
End Sub
Sub find_to_keep(ByRef rows() As Row, ByVal id As String, ByVal cnt As Integer)
Dim max_date As Date 'temparary variable to hold the maximum date
Dim to_keep As Integer 'temporary variable to hold the location of row to keep
' -- step a: go throw the array and find all rows with id specified in the sub parameter
For i = 0 To cnt - 1
If rows(i).id = id Then
'if that row has a date that is higher than our current max_date, the read its date
If rows(i).updated > max_date Then
max_date = rows(i).updated
to_keep = i
End If
End If
Next
'-- step b: now that we know what row to keep, we need to do:
' 1- mark all other rows having the same ID as verified (to avoid looping through them again)
' 2- mark the row with the highest date to (to_keep) = true
For i = 0 To cnt - 1
If rows(i).id = id Then
If i = to_keep Then
rows(i).to_keep = True
Else
rows(i).to_keep = False
End If
rows(i).verified = True
End If
Next
End Sub
Here is what it looks like:
And if you like, I have attached the entire workbook for your reference: remove_Duplicates.xlsm
Upvotes: 1