drLecter
drLecter

Reputation: 197

Excel VBA: Optimizing code to delete rows based on a duplicate in a column

I am trying to come up with a lean and error-proofed macro to delete rows containing duplicate values in a column A. I have two solutions and both have their advantages. None of them are exactly what I want.

I need rows containing duplicates deleted but leaving the last row that contained the duplicate.

  1. This one is awesome. It has no loop and works instantaneously. The problem is that it deletes subsequent rows containing duplicates hence leaving the first occurrence of the duplicate (And I need the last/ or second - most show up only twice)

    Sub Delete() ActiveSheet.Range("A:E").RemoveDuplicates Columns:=1, Header:=xlNo End Sub

  2. This one goes from the bottom and deletes duplicates. It lasts longer than the first one ( I have around 6k rows) But the issue with this one is that it doesnt delete them all. Some duplicates are left and they are deleted after I run the same code again. Even smaller number of duppes is still left. Basically need to run it up to 5 times and then I end up with clean list.

    `

    Sub DeleteDup()

      Dim LastRowcheck As Long, n1 As Long, rowschecktodelete As Long
    
      LastRowcheck = Worksheets("Sheet1").Range("A" & Rows.Count).End(xlUp).Row
    
      For n1 = 1 To LastRowcheck
        With Worksheets("Sheet1").Cells(n1, 1)
          If Cells(n1, 1) = Cells(n1 + 1, 1) Then
            Worksheets("Sheet1").Cells(n1, 1).Select
            Selection.EntireRow.Delete
         End If
       End With
      Next n1
    
      End Sub
    

` Is there a way to improve any of these to work well or is there a better solution? Any info is greatly appreciated. Thanks

Upvotes: 0

Views: 1284

Answers (2)

Dirk Reichel
Dirk Reichel

Reputation: 7979

The easiest way would be to delete all rows at once. Also to increase speed, you better do your checks with variables and not with the real cell values like this:

Sub DeleteDup()

  Dim LastRowcheck As Long
  Dim i As Long
  Dim rows_to_delete As Range
  Dim range_to_check As Variant

  With Worksheets("Sheet1")
    LastRowcheck = .Cells(Rows.Count, 1).End(xlUp).Row
    range_to_check = .Range("A1:A" & LastRowcheck).Values

    For i = 1 To LastRowcheck - 1
      If range_to_check(i, 1) = range_to_check(i + 1, 1) Then
        If rows_to_delete Is Nothing Then
          Set rows_to_delete = .Cells(i, 1)
        Else
          Set rows_to_delete = Union(.Cells(i, 1), rows_to_delete)
        End If
      End If
    Next n1
  End With

  rows_to_delete.EntireRow.Delete

End Sub

Upvotes: 2

Comintern
Comintern

Reputation: 22205

The concept is right, but remember that when you delete rows, Cells(n1 + 1, 1) isn't going to be the same thing as it was before you deleted a row. The solution is to simply reverse the loop and test rows from bottom to top:

Sub DeleteDup()
    Dim last As Long
    Dim current As Long
    Dim sheet As Worksheet

    Set sheet = Worksheets("Sheet1")
    With sheet
        last = .Range("A" & .Rows.Count).End(xlUp).Row
        For current = last To 1 Step -1
            If .Cells(current + 1, 1).Value = .Cells(current, 1).Value Then
                .Rows(current).Delete
            End If
        Next current
    End With
End Sub

Note that you can use the loop counter to index .Rows instead of using the Selection object to improve performance fairly significantly. Also, if you grab a reference to the Worksheet and toss the whole thing in a With block you don't have to continually dereference Worksheets("Sheet1"), which will also improve performance.

If it still runs too slow, the next step would be to flag rows for deletion, sort on the flag, delete the entire flagged range in one operation, then sort back to the original order. I'm guessing the code above should be fast enough for ~6K rows though.

Upvotes: 1

Related Questions