CurveGamma
CurveGamma

Reputation: 4559

Speeding up a search and delete macro

I have a list containing three columns. The first column contains Names and the other two columns have numbers. The macro takes the first name(A1) and then searches down column A for another occurrence.

When it finds it, it deletes the entire row.It then goes to A2 and does the same thing agan. It works ok for about 500 entries, but using 3000 entries slows it down considerably. Is there a way to speed up this code?

Sub Button1_DeleteRow()

Dim i As Integer
Dim j As Integer    
Dim Value As Variant
Dim toCompare As Variant

For i = 1 To 3000      
    Value = Cells(i, 1)
    For j = (i + 1) To 3000
        toCompare = Cells(j, 1)
        If (StrComp(Value, toCompare, vbTextCompare) = 0) Then
           Rows(j).EntireRow.Delete
        End If
      Next j  
Next i

End Sub 

Upvotes: 2

Views: 686

Answers (4)

barrowc
barrowc

Reputation: 10679

Sorting the data on column A would then make it trivial to identify and remove the duplicates in a single pass


In response to the comment below, I'll explain why sorting is a useful technique.

By sorting column A into order, duplicate removal simply becomes a matter of comparing adjacent entries in column A. You can then either delete the duplicate rows as you find them or flag them for later deletion.

The process should actually be a lot less tedious as you only have to sort the list (and sorting, being built-in, tends to be very fast) and then do one pass (instead of 4498500) through the list deleting/flagging as you go (obviously you need a subsequent clean-up pass if you go for flagging).

On the issue of changing the order of the list, start by adding an extra column (e.g. column D) and have D2 contain the value 2 (i.e. just the row number). A quick fill-down later and every row is numbered. After sorting and deleting/flagging, restoring the original order is just a matter of re-sorting on column D which could then be deleted.

I use this method when I have to perform some operation or other on the duplicates. In other words, column A has duplicate values but the values in columns B and C are meaningful (for example, I might want to sum these values from all of the entries relating to the specific value of column A). In many cases, however, it would be easier just to use SQL to achieve the same result

Upvotes: 0

Charles Williams
Charles Williams

Reputation: 23520

Using Bretts technique is a good answer: but to answer your question about why does it take so long:
- Your macro is getting a value from over 4 million cells one by one. This is very slow.
- I don't see that your macro has switched off screenupdating and automatic calculation: every time a row is deleted the screen will refresh and Excel will recalculate. If you have not switched these off it is very slow.
This code should run a lot faster

Option Explicit
Sub Button1_DeleteRow()
Dim i As Long
Dim j As Long
Dim vArr As Variant
Dim iComp As Long
Dim Deletes(1 To 3000) As Boolean
Application.ScreenUpdating = False
iComp = Application.Calculation
Application.Calculation = xlCalculationManual
vArr = Range("a1:A3000")
For i = 1 To 3000
    For j = (i + 1) To 3000
        If (StrComp(vArr(i, 1), vArr(j, 1), vbTextCompare) = 0) Then
           Deletes(j) = True
        End If
      Next j
Next i
For j = 3000 To 1 Step -1
If Deletes(j) Then Rows(j).EntireRow.Delete
Next j
Application.ScreenUpdating = True
Application.Calculation = iComp
End Sub

Upvotes: 2

To supplement @brettdj's answer, if you are running Excel 2003, you can do this using AdvancedFilter as follows:

Range("A1:A11").AdvancedFilter Action:=xlFilterInPlace, Unique:=True

Note: AdvancedFilter assumes that the first row of your range (row A in this example) contains column headers and will not include that row in the filtering.

To do this manually: Data > Filter > Advanced Filter... > Unique records only

Upvotes: 3

brettdj
brettdj

Reputation: 55682

If you are running xl07/10 then you can do this with a single line with Remove Duplicates. If you are running 03 then a solution with AutoFilter will be most efficient (I can provide this if you are on the older version)

Remove Duplicates

  1. Manually

    • Select column A
    • Data .... Remove Duplicates
    • Expand selection
    • Select only column A to find duplicates on
  2. Code

    ActiveSheet.Range("$A$1:$A$3000").EntireRow.RemoveDuplicates Columns:=1, Header:=xlNo

before only A after

Upvotes: 4

Related Questions