user1521458
user1521458

Reputation: 23

Excel Macro Arrays

Currently I have a macro that runs through a list and deletes duplicate values (in one column), but it's proving to be very inefficient. For every entry that it checks for duplicates, it has to run through the whole column; my file currently has 50,000 entries and that is no small task.

I think an easier way for the macro to work is for the macro to check if this value is in an array. If it is, then remove the row that the entry is in. If it isn't, add the value to the array.

Can someone provide some help with the basic outline of the macro? Thanks

Upvotes: 2

Views: 8545

Answers (4)

Excel Developers
Excel Developers

Reputation: 2825

For Excel 2007 and later: Copy the array to a sheet and use the removeduplicates method:

set ws = worksheets.add
ws.[A1].resize(ubound(yourarray,1),ubound(yourarray,2)).value = yourarray
ws.usedrange.removeduplicates columns:=1, header:=no

This assumes the lower bound of your array is 1, that the column you want to de-duplicate is column 1 and that your list has no headers. You can then find the borders of the new range and read it back into your array (erase the current array first).

Upvotes: 1

danielpiestrak
danielpiestrak

Reputation: 5439

The Below code will loop through your source data and store it in an array, while simultaneously checking for duplicates. After the collection is complete it uses the array as a key to know which columns to delete.

Due to the high number of potentiol screen updates with the deletion be sure to turn screenupdating off. (included)

Sub Example()
    Application.ScreenUpdating = false
    Dim i As Long
    Dim k As Long
    Dim StorageArray() As String
    Dim iLastRow As Long
    iLastRow = ActiveSheet.Cells(ActiveSheet.Rows.Count, "A").End(xlUp).Row

    ReDim StorageArray(1 To iLastRow, 0 To 1)

    'loop through column from row 1 to the last row
    For i = 1 To iLastRow
        'add each sheet value to the first column of the array
        StorageArray(i, 0) = ActiveSheet.Range("A" & i).Value
        '- keep the second column as 0 by default
        StorageArray(i, 1) = 0
        '- as each item is added, loop through previously added items to see if its a duplicate
        For k = 1 To i-1
            If StorageArray(k, 0) = StorageArray(i, 0) Then
                'if it is a duplicate set the second column of the srray to 1
                StorageArray(i, 1) = 1
                Exit For
            End If
        Next k
    Next i

    'loop through sheet backwords and delete rows that were maked for deletion
    For i = iLastRow To 1 Step -1
        If StorageArray(i, 1) = 1 Then
            ActiveSheet.Range("A" & i).EntireRow.Delete
        End If
    Next i

    Application.ScreenUpdating = true
End Sub

As requested, here is a similar way to do it, using Collections instead of an Array for key indexing: (RBarryYoung)

Public Sub RemovecolumnDuplicates()
    Dim prev as Boolean
    prev = Application.ScreenUpdating
    Application.ScreenUpdating = false
    Dim i As Long, k As Long

    Dim v as Variant, sv as String
    Dim cl as Range, ws As Worksheet
    Set ws = ActiveWorksheet    'NOTE: This really should be a parameter ...

    Dim StorageArray As New Collection
    Dim iLastRow As Long
    iLastRow = ws.Cells(ActiveSheet.Rows.Count, "A").End(xlUp).Row

    'loop through column from row 1 to the last row
    i = 1
    For k = 1 To iLastRow
        'add each sheet value to the collection
        Set cl = ws.Cells(i, 1)
        v = cl.Value
        sv = Cstr(v)

        On Error Resume Next
            StorageArray.Add v, sv
        If Err.Number <> 0 Then
            'must be a duplicate, remove it
            cl.EntireRow.Delete
            'Note: our index doesn't change here, since all of the rows moved
        Else
            'not a duplicate, so go to the next row
            i = i + 1
        End If
    Next k

    Application.ScreenUpdating = prev
End Sub

Note that this method does not need to assume any datatype or integer limits for the values of the cells in the column.

(Mea Culpa: I had to hand-enter this in Notepad, because my Excel is busy running project tests right now. So there may be some spelling/syntax errors...)

Upvotes: 3

Siddharth Rout
Siddharth Rout

Reputation: 149287

This is a followup to my comment. Looping 50k records + Looping the Array will be an over kill for such a simple operation.

Like I mentioned in my comment, copy the values from the array to a new sheet. Then insert a blank column next to the 50k entries and do a Vlookup or CountIf. Once done, do an Autofilter and then delete the duplicate entries in 1 go. Let's take an example and see how this will work.

Let's say we have have an array with 1000 items? and in 1 sheet we have 50k data. The below code will be tested with 1000 items in Array and 50k Data See Snapshot

enter image description here

Paste this code in a module (The code took less then 5 secs to finish)

enter image description here

Sub Sample()
    Dim ws As Worksheet, wstemp As Worksheet
    Dim LRow As Long
    Dim Ar(1 To 1000) As Long
    Dim startTime As String, EndTime As String

    startTime = Format(Now, "hh:mm:ss")

    Set ws = Sheets("Sheet1")
    Set wstemp = Sheets.Add

    '~~> Creating a dummy array
    For i = 1 To 1000
        Ar(i) = i
    Next i

    '~~> Copy it to the new sheet
    wstemp.Range("A1:A1000").Value = Application.Transpose(Ar)

    With ws
        LRow = .Range("A" & .Rows.Count).End(xlUp).Row

        .Columns(2).Insert Shift:=xlToRight
        .Range("B1").Value = "For Deletion"
        .Range("B2:B" & LRow).FormulaR1C1 = "=COUNTIF(" & wstemp.Name & "!C[-1],RC[-1])"
        .Columns(2).Value = .Columns(2).Value

        '~~> Remove any filters
        .AutoFilterMode = False

        '~~> Filter, offset(to exclude headers) and delete visible rows
        With .Range("B1:B" & LRow)
            .AutoFilter Field:=1, Criteria1:="<>0"
            .Offset(1, 0).SpecialCells(xlCellTypeVisible).EntireRow.Delete
        End With

        '~~> Remove any filters
        .AutoFilterMode = False

        .Columns(2).Delete
    End With

    EndTime = Format(Now, "hh:mm:ss")

    MsgBox "The process started at " & startTime & " and finished at" & EndTime
End Sub

Upvotes: 1

Nick
Nick

Reputation: 23

I would suggest filltering your column and then use a formula to find the duplicates and delete them. I don't have the actually code for you (you didn't give us any code)

dim a as range
dim b as range
set a = Range ("A1")

Do while Not isEmpty(A)
Set b = a.offset(1,0)

If b = a then
b= ""
else a.offset (1,0)

Loop

I am sure you could put the filter in the code or just rember to fillter before you run the macro.

Upvotes: 0

Related Questions