Reputation: 23
Currently I have a macro that runs through a list and deletes duplicate values (in one column), but it's proving to be very inefficient. For every entry that it checks for duplicates, it has to run through the whole column; my file currently has 50,000 entries and that is no small task.
I think an easier way for the macro to work is for the macro to check if this value is in an array. If it is, then remove the row that the entry is in. If it isn't, add the value to the array.
Can someone provide some help with the basic outline of the macro? Thanks
Upvotes: 2
Views: 8545
Reputation: 2825
For Excel 2007 and later: Copy the array to a sheet and use the removeduplicates method:
set ws = worksheets.add
ws.[A1].resize(ubound(yourarray,1),ubound(yourarray,2)).value = yourarray
ws.usedrange.removeduplicates columns:=1, header:=no
This assumes the lower bound of your array is 1, that the column you want to de-duplicate is column 1 and that your list has no headers. You can then find the borders of the new range and read it back into your array (erase the current array first).
Upvotes: 1
Reputation: 5439
The Below code will loop through your source data and store it in an array, while simultaneously checking for duplicates. After the collection is complete it uses the array as a key to know which columns to delete.
Due to the high number of potentiol screen updates with the deletion be sure to turn screenupdating off. (included)
Sub Example()
Application.ScreenUpdating = false
Dim i As Long
Dim k As Long
Dim StorageArray() As String
Dim iLastRow As Long
iLastRow = ActiveSheet.Cells(ActiveSheet.Rows.Count, "A").End(xlUp).Row
ReDim StorageArray(1 To iLastRow, 0 To 1)
'loop through column from row 1 to the last row
For i = 1 To iLastRow
'add each sheet value to the first column of the array
StorageArray(i, 0) = ActiveSheet.Range("A" & i).Value
'- keep the second column as 0 by default
StorageArray(i, 1) = 0
'- as each item is added, loop through previously added items to see if its a duplicate
For k = 1 To i-1
If StorageArray(k, 0) = StorageArray(i, 0) Then
'if it is a duplicate set the second column of the srray to 1
StorageArray(i, 1) = 1
Exit For
End If
Next k
Next i
'loop through sheet backwords and delete rows that were maked for deletion
For i = iLastRow To 1 Step -1
If StorageArray(i, 1) = 1 Then
ActiveSheet.Range("A" & i).EntireRow.Delete
End If
Next i
Application.ScreenUpdating = true
End Sub
As requested, here is a similar way to do it, using Collections instead of an Array for key indexing: (RBarryYoung)
Public Sub RemovecolumnDuplicates()
Dim prev as Boolean
prev = Application.ScreenUpdating
Application.ScreenUpdating = false
Dim i As Long, k As Long
Dim v as Variant, sv as String
Dim cl as Range, ws As Worksheet
Set ws = ActiveWorksheet 'NOTE: This really should be a parameter ...
Dim StorageArray As New Collection
Dim iLastRow As Long
iLastRow = ws.Cells(ActiveSheet.Rows.Count, "A").End(xlUp).Row
'loop through column from row 1 to the last row
i = 1
For k = 1 To iLastRow
'add each sheet value to the collection
Set cl = ws.Cells(i, 1)
v = cl.Value
sv = Cstr(v)
On Error Resume Next
StorageArray.Add v, sv
If Err.Number <> 0 Then
'must be a duplicate, remove it
cl.EntireRow.Delete
'Note: our index doesn't change here, since all of the rows moved
Else
'not a duplicate, so go to the next row
i = i + 1
End If
Next k
Application.ScreenUpdating = prev
End Sub
Note that this method does not need to assume any datatype or integer limits for the values of the cells in the column.
(Mea Culpa: I had to hand-enter this in Notepad, because my Excel is busy running project tests right now. So there may be some spelling/syntax errors...)
Upvotes: 3
Reputation: 149287
This is a followup to my comment. Looping 50k records + Looping the Array will be an over kill for such a simple operation.
Like I mentioned in my comment, copy the values from the array to a new sheet. Then insert a blank column next to the 50k entries and do a Vlookup
or CountIf
. Once done, do an Autofilter and then delete the duplicate entries in 1 go. Let's take an example and see how this will work.
Let's say we have have an array with 1000 items? and in 1 sheet we have 50k data. The below code will be tested with 1000 items in Array
and 50k Data
See Snapshot
Paste this code in a module (The code took less then 5 secs to finish)
Sub Sample()
Dim ws As Worksheet, wstemp As Worksheet
Dim LRow As Long
Dim Ar(1 To 1000) As Long
Dim startTime As String, EndTime As String
startTime = Format(Now, "hh:mm:ss")
Set ws = Sheets("Sheet1")
Set wstemp = Sheets.Add
'~~> Creating a dummy array
For i = 1 To 1000
Ar(i) = i
Next i
'~~> Copy it to the new sheet
wstemp.Range("A1:A1000").Value = Application.Transpose(Ar)
With ws
LRow = .Range("A" & .Rows.Count).End(xlUp).Row
.Columns(2).Insert Shift:=xlToRight
.Range("B1").Value = "For Deletion"
.Range("B2:B" & LRow).FormulaR1C1 = "=COUNTIF(" & wstemp.Name & "!C[-1],RC[-1])"
.Columns(2).Value = .Columns(2).Value
'~~> Remove any filters
.AutoFilterMode = False
'~~> Filter, offset(to exclude headers) and delete visible rows
With .Range("B1:B" & LRow)
.AutoFilter Field:=1, Criteria1:="<>0"
.Offset(1, 0).SpecialCells(xlCellTypeVisible).EntireRow.Delete
End With
'~~> Remove any filters
.AutoFilterMode = False
.Columns(2).Delete
End With
EndTime = Format(Now, "hh:mm:ss")
MsgBox "The process started at " & startTime & " and finished at" & EndTime
End Sub
Upvotes: 1
Reputation: 23
I would suggest filltering your column and then use a formula to find the duplicates and delete them. I don't have the actually code for you (you didn't give us any code)
dim a as range
dim b as range
set a = Range ("A1")
Do while Not isEmpty(A)
Set b = a.offset(1,0)
If b = a then
b= ""
else a.offset (1,0)
Loop
I am sure you could put the filter in the code or just rember to fillter before you run the macro.
Upvotes: 0