Bluesector
Bluesector

Reputation: 329

Remove duplicates (huge amount of Data, very slow)

I've a macro that deletes duplicates (based on column A). It sorts column P ascending then it removes the entire row that is a duplicate, so I can make sure that the macro only removes the oldest lines (column P = dates):

Sub SortAndRemoveDUBS()

Dim Rng As Range
Dim LastRow As Long
Dim i As Long

Application.ScreenUpdating = False

LastRow = Cells(Rows.Count, "B").End(xlUp).Row

Set Rng = Range("A4:P" & LastRow)

With Rng
    .Sort Key1:=Range("A4"), Order1:=xlAscending, key2:=Range("P4"), order2:=xlDescending, _
        Header:=xlYes, OrderCustom:=1, MatchCase:=False, _
        Orientation:=xlTopToBottom
End With

For i = LastRow To 2 Step -1
    If WorksheetFunction.CountIf(Range(Cells(2, "A"), Cells(i, "A")), Cells(i, "A")) > 1 Then
        Rows(i).Delete
    End If
Next i

Application.ScreenUpdating = True

End Sub

But the macro is very slow... is there a way to speed that up? I think it's because he deletes every duplicate one by one.

Upvotes: 1

Views: 475

Answers (3)

Clauric
Clauric

Reputation: 1886

Similar to @Fabrizio's comments, I found this one to work quite well.

Sub Delete_row()

Dim a As Variant

    ' selects all data in columns A to P and sorts by data in column P from oldest to newest
    Columns("A:P").Select
    ActiveWorkbook.Worksheets("Sheet1").Sort.SortFields.Clear
    ActiveWorkbook.Worksheets("Sheet1").Sort.SortFields.Add Key:=Range( _
        "P:P"), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:= _
        xlSortNormal
        With ActiveWorkbook.Worksheets("Sheet1").Sort
        .SetRange Range("A:P")
        .Header = xlGuess
        .MatchCase = False
        .Orientation = xlTopToBottom
        .SortMethod = xlPinYin
        .Apply
    End With

    a = 2

    While Cells(a, 16) <> vbNullString

'       Marks column Q with a value of 1 for every cell in P
'       that has the same date as the previous cell

        If Cells(a, 16) = Cells(a - 1, 16) Then
            Cells(a, 17) = 1
        End If

        a = a + 1
    Wend

'       Filters column Q for the value of 1

        Columns("A:Q").AutoFilter
        ActiveSheet.Range("$A:Q").AutoFilter Field:=17, Criteria1:="<>"

        Range(Selection, Selection.End(xlToRight)).Select
        Range(Selection, Selection.End(xlDown)).Select
        Selection.ClearContents

        ActiveSheet.Range("$A:Q").AutoFilter Field:=17

        Columns("A:P").Select
        ActiveWorkbook.Worksheets("Sheet1").Sort.SortFields.Clear
        ActiveWorkbook.Worksheets("Sheet1").Sort.SortFields.Add Key:=Range( _
            "P:P"), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:= _
            xlSortNormal
            With ActiveWorkbook.Worksheets("Sheet1").Sort
            .SetRange Range("A:P")
            .Header = xlGuess
            .MatchCase = False
            .Orientation = xlTopToBottom
            .SortMethod = xlPinYin
            .Apply
        End With

        Columns("Q:Q").ClearContents

End Sub

I've changed the code to increase the speed of the macro. Runs in about 30-35 seconds using Excel 2010, 32-bit, with 2nd gen i5, and 8GB RAM.

Upvotes: 0

Excel Developers
Excel Developers

Reputation: 2825

CountIf is slow, and deleting rows one at a time is slow. Try using a Dictionary (you will need to set a reference to Microsoft Scripting Runtime).

Sub SortAndRemoveDUBS()

Dim Rng As Range
Dim LastRow As Long
Dim i As Long

Application.ScreenUpdating = False

LastRow = Cells(Rows.Count, "B").End(xlUp).Row

Set Rng = Range("A4:P" & LastRow)

With Rng
    .Sort Key1:=Range("A4"), Order1:=xlAscending, key2:=Range("P4"), order2:=xlDescending, _
        Header:=xlYes, OrderCustom:=1, MatchCase:=False, _
        Orientation:=xlTopToBottom
End With

Dim dict As New Dictionary
Dim r As Range

For i = 2 To LastRow
    If dict.Exists(Cells(i, "A").Value) Then
        If r Is Nothing Then
            Set r = Cells(i, "A")
        Else
            Set r = Union(r, Cells(i, "A"))
        End If
    Else
        dict.Add Cells(i, "A").Value, 1
    End If
Next i

r.EntireRow.Delete
Application.ScreenUpdating = True

End Sub

Upvotes: 2

Stupid_Intern
Stupid_Intern

Reputation: 3450

You can do the delete operation at the end by collecting all the row numbers in an array like this:

(NOT Tested)

Dim arr() as variant ,cnt As LOng
cnt=0

For i = LastRow To 2 Step -1
    If WorksheetFunction.CountIf(Range(Cells(2, "A"), Cells(i, "A")), Cells(i, "A")) > 1 Then
      Redim Preserve arr(cnt)
      arr(cnt) = i  
      cnt=cnt+1
    End If
Next i

If Len(join(arr))> 0 then ActiveSheet.Range("A" & Join(arr, ",A")).EntireRow.Delete

Upvotes: 2

Related Questions