Metabinary
Metabinary

Reputation: 187

Excel [VBA] find duplicate data and delete the oldest one

I have a problem in MS Excel. I´ve a spreadsheet with data like this:

 Name     |     timestamp
 ------------------------
 Smith    | 12.05.2015
 Smith    | 01.01.2015
 Smith    | 10.05.2015
 Simpson  | 14.04.2015
 Simpson  | 10.02.2015
 Simpson  | 21.03.2015
 Simpson  | 02.01.2015

The data I´ve is much bigger and komplex and there are duplicates with different timestamps. Now I want to delete the oldes ones and want an output like this:

 Name     |     timestamp
 Smith    | 12.05.2015
 Simpson  | 14.04.2015

I know how to remove duplicates, but in this case it´s a little bit different. I hope you can help me to solve the problem.

Upvotes: 3

Views: 2640

Answers (2)

Jack Geiger
Jack Geiger

Reputation: 115

I did a bit of testing, and Range.RemoveDuplicates appears to keep the first entry for each duplicate value (at least in a sorted range, which you're going to use). Here's my solution:

Sub SortAndCondense()
'This subroutine sorts a table by name and secondarily by descending date.  It then removes
'all duplicates in the name column.  By sorting the dates in descending order, only the most
'recent entries for each name are preserved
Dim wrkSht As Worksheet
Set wrkSht = ThisWorkbook.Worksheets("Sheet1")
Dim dateTable As Range
Dim header1 As Range, header2 As Range
Set dateTable = wrkSht.Range("A2:B7") 'insert your actual table range; modify as necessary for column headers
Set header1 = wrkSht.Range("A2")
Set header2 = wrkSht.Range("B2")

'sort the column primarily by name, and secondarily by descending date.  The order in which the names are sorted
'is irrelevant.
dateTable.Sort Key1:=header1, Key2:=header2, Order2:=xlDescending

'remove all duplicate names.  The way remove duplicates works (so far as I can tell) is that it keeps only the
'topmost entry for each duplicate column.
dateTable.RemoveDuplicates 1
End Sub

Upvotes: 0

Mark Fitzgerald
Mark Fitzgerald

Reputation: 3068

You may not need VBA.

In my experience the Excel Remove Duplicates code works to remove the first encountered duplicates in a list.

So sort your data by Name ascending and timestamp descending, then remove the duplicates from the Name field only.

You should be left with the most recent names.

Upvotes: 1

Related Questions