Jeffrey Kramer
Jeffrey Kramer

Reputation: 1345

Remove Duplicates from VBA Array With Condition

At the request of a user, I have rewritten this question with more information and tried to clarify it as much as I possibly can.

I have code that reads a range into an array. Many calculations are performed. The resulting array contains an ID and two values:

ID   Seq   Value
a    1     100
a    2     150
a    3     200
b    1     10
b    2     10
b    3     10

However, the calculation step uses Redim Preserve so I have to store the array as TestArray(1 To 3, 1 To 6).

I need to filter the array for duplicate ID's.

If there is no duplicate, I need to store ID, seq and value.

If there is a duplicate ID, I need to store the ID, seq and value where value is the maximum value for a given ID.

If there is a duplicate ID and there are multiple instances of a maximum value, I want to keep the ID, date and value where the value is the maximum value for a given ID and seq is the minimum seq for a given ID.

Basically, for each ID I want the maximum value and if there are multiple maximums, default to the earliest sequence number.

This is a sample of code that shows how the array is structured and what I need the results to look like.

Sub TestArray()

  Dim TestArray() As Variant
  Dim DesiredResults() As Variant

  TestArray = Array(Array("a", "a", "a", "b", "b", "b"), _
    Array(1, 2, 3, 1, 2, 3), _
    Array(100, 150, 200, 10, 10, 10))
  DesiredResults = Array(Array("a", "b"), Array(3, 1), Array(200, 10))

End Sub

Is there some way to loop through the array and find duplicates and then compare them? I could do this easily in SQL but I am struggling in VBA.

Upvotes: 1

Views: 2155

Answers (1)

Cor_Blimey
Cor_Blimey

Reputation: 3310

I kept my test code in so you can inspect the results and play around. I commented why certain things are being done - hope it helps.

The return array is base 1, in the format (column, row). You can of course change this.

Option Explicit

Public Sub TestProcess()

    Dim testResults
    testResults = GetProcessedArray(getTestArray)
    With ActiveSheet
        .Range( _
            .Cells(1, 1), _
            .Cells( _
                1 + UBound(testResults, 1) - LBound(testResults, 1), _
                1 + UBound(testResults, 2) - LBound(testResults, 2))) _
            .Value = testResults
    End With

End Sub

Public Function GetProcessedArray(dataArr As Variant) As Variant

    Dim c As Collection
    Dim resultsArr
    Dim oldResult, key As String
    Dim i As Long, j As Long, lb1 As Long

    Set c = New Collection
    lb1 = LBound(dataArr, 1) 'just cache the value of the lower bound as we use it a lot

    For j = LBound(dataArr, 2) To UBound(dataArr, 2)

        'extract current result for the ID, if any
        '(note that if the ID's aren't necessarily the same type you can add
        ' the key with  prefix of VarType or TypeName as something like key = CStr(VarType(x)) & "|" & CStr(x))
        key = CStr(dataArr(lb1 + 0, j))
        On Error Resume Next
        oldResult = c(key)

        If Err.Number = 5 Then 'error number if record does not exist

            On Error GoTo 0
            'record doesn't exist so add it
            c.Add Array( _
                key, _
                dataArr(lb1 + 1, j), _
                dataArr(lb1 + 2, j)), _
                key

        Else

            On Error GoTo 0
            'test if new value is greater than old value
            If dataArr(lb1 + 2, j) > oldResult(2) Then
                'we want the new one, so:
                'Collection.Item reference is immutable so remove the record
                c.Remove key
                'and Add the new one
                c.Add Array( _
                    key, _
                    dataArr(lb1 + 1, j), _
                    dataArr(lb1 + 2, j)), _
                    key
            ElseIf dataArr(lb1 + 2, j) = oldResult(2) Then
                'test if new sequence number is less than old sequence number
                If dataArr(lb1 + 1, j) < oldResult(1) Then
                    'we want the new one, so:
                    'Collection.Item reference is immutable so remove the record
                    c.Remove key
                    'and Add the new one
                    c.Add Array( _
                        key, _
                        dataArr(lb1 + 1, j), _
                        dataArr(lb1 + 2, j)), _
                        key
                End If
            End If

        End If

    Next j

    'process results into the desired array format
    ReDim resultsArr(1 To 3, 1 To c.Count)
    For j = 1 To c.Count
        For i = 1 To 3
            resultsArr(i, j) = c(j - LBound(resultsArr, 2) + 1)(i - LBound(resultsArr, 1))
        Next i
    Next j

    GetProcessedArray = resultsArr

 End Function

Private Function getTestArray()

  Dim testArray() As Variant
  Dim flatArray
  Dim i As Long
  ReDim flatArray(0 To 2, 0 To 5)

  testArray = Array( _
    Array("a", "a", "a", "b", "b", "b"), _
    Array(1, 2, 3, 1, 2, 3), _
    Array(100, 150, 200, 10, 10, 10))

  For i = 0 To 5

    flatArray(0, i) = testArray(0)(i)
    flatArray(1, i) = testArray(1)(i)
    flatArray(2, i) = testArray(2)(i)

  Next i

  getTestArray = flatArray

End Function

Upvotes: 5

Related Questions