Xander
Xander

Reputation: 15

VBA Compare 2 arrays and return missing rows INCLUDING DUPLICATES

Every morning I have to compare manually row by row the information in my database with the one sent by the broker. Normally they should have exactly the same information. The information is about executed trades.

1- I need to compare row by row looking at the values "Stock", "Qty", "Price" and "Date". If any row is not matched from each other (i.e. a value is erroneously different in one of the table or the entire row is missing), I need the unmatched rows to be printed in a third "OUTPUT" table.

My TABLE, BROKER'S TABLE & OUTPUT

enter image description here

2- The thing here is that there are duplicates like for "microsoft" or "nvidia" that are independent trades (different IDs). Duplicates must be kept in the comparison because they are different trades.

How could I manage the duplicates problems? Using Collections in Dictionary could help me? I would compare Table A to Table B and then Table B to Table A. Or the presence of duplicates (actually independent trades) makes it impossible to perform ?

My file has more than 500 rows.

Upvotes: 0

Views: 1471

Answers (2)

basodre
basodre

Reputation: 5770

I'll add an entirely in-memory way to achieve the output.

The one thing worth considering is that we have no way of differentiating one ID from the next if they have matching stock, price, qty, and dates. So, the way I'm handling duplicates is in the order that they appear in the spreadsheet. So if there are two matching entries in your table, and only 1 in the broker table, I assume that the first entry in your table matches the first entry in the broker table, and therefore your second entry will be output.

Try working through the code. I took a few shortcuts for the sake of timing, but I also encapsulated each of the functions so that you can modify as needed. You'll just have to build out the DeserializeKey function to convert a key back to cells in the output sheet (shouldn't be too hard). See the below code, and let me know if it meets expectations.

Note: You will run the "CompareDictionaries" subroutine. The others are helper functions.

Sub CompareDictionaries()
    Dim oMine As Object
    Dim oBroker As Object
    Dim myQueueCount As Long
    Dim brokerQueueCount As Long
    Dim minQueue As Long
    Dim oMinQueue As Object
    
    Set oMine = GetDictionary(Sheet1.Range("A2:E7"))
    Set oBroker = GetDictionary(Sheet2.Range("A2:E6"))
    
    For Each oKey In oMine.keys
        'The whole row does not exist in the broker table
        If Not oBroker.Exists(oKey) Then
            Do While oMine(oKey).Count > 0
                DeserializeKey oKey, oMine(oKey).dequeue
            Loop
        
        Else 'The keys exist in both tables
            myQueueCount = oMine(oKey).Count
            brokerQueueCount = oBroker(oKey).Count
            If myQueueCount = brokerQueueCount Then
                'Do nothing. They both have the same number of
                'id's, and so we assume they are in sync.
            Else
                'Determine the minimum queue size, and get rid
                'of that many values, since we won't need them
                minQueue = IIf(myQueueCount < brokerQueueCount, myQueueCount, brokerQueueCount)
                
                For i = 1 To minQueue
                    oMine(oKey).dequeue
                    oBroker(oKey).dequeue
                Next i
                
                'Take the remaining IDs out of the dictionary/queue that had more
                If brokerQueueCount > myQueueCount Then
                    Set oMinQueue = oBroker
                Else
                    Set oMinQueue = oMine
                End If
                
                Do While oMinQueue(oKey).Count > 0
                    DeserializeKey oKey, oMinQueue(oKey).dequeue
                Loop
            End If
        End If
    Next oKey
    
    'The only remaining thing to test for is keys in the broker dict
    'that are not in the myDict
    For Each oKey In oBroker.keys
        If Not oMine.Exists(oKey) Then
            Do While oBroker(oKey).Count > 0
                DeserializeKey oKey, oBroker(oKey).dequeue
            Loop
        End If
    Next oKey
End Sub


Function GetDictionary(inputRange As Range) As Object
    Dim oDict As Object
    Dim sht As Worksheet
    Dim cel As Range
    Dim theKey As String
    Dim oQueue As Object
    
    Set sht = inputRange.Parent
    
    Set oDict = CreateObject("Scripting.Dictionary")
    
    For Each cel In Intersect(inputRange, sht.Columns(1))
        theKey = GenerateKey(cel.Resize(, 5))
        
        If oDict.Exists(theKey) Then
            oDict(theKey).enqueue cel.Value
        Else
            Set oQueue = CreateObject("System.Collections.Queue")
            oQueue.enqueue cel.Value
            oDict.Add theKey, oQueue
        End If
    Next cel
    
    Set GetDictionary = oDict
End Function


Sub DeserializeKey(ByVal theKey As String, theId As String)
    'This is where you'd do some stuff to
    'turn the key into individual cells, and store it.
    'I'm only writing to the debug widnow to demonstrate
    
    Debug.Print theId & " " & theKey
End Sub


Function GenerateKey(rng As Range) As String
    GenerateKey = rng(2) & Format(rng(3), "0") _
        & Format(rng(4), "0.00") & Format(rng(5), "mmddyyyy")

End Function

For those interested in this method, I'm editing this answer to add the "deserializekey" function:

  
    
Dim r As Long
Worksheets("Output").Activate

r = 1

'What we are doing here with "loopcell" is to check if the destination cells in the "output" sheet are empty or free.
'If not, we go down 1 row.
loopcell:
If IsEmpty(Range("A" & r).Value) = True Then
    Range("A" & r).Value = "_" & theId & "_" & theKey
Else
r = r + 1
GoTo loopcell
End If

'The key is wrote to the the cell but we need to split every element of the key in multiple cells. 
splitOutput = Range("A" & r).Value
splitArray = Split(splitOutput, "_")

For i = 1 To UBound(splitArray)
Cells(r, i).Value = splitArray(i)
Next i

    Debug.Print theId & " " & theKey
End Sub ```


** New GetDictionary and Deserialize methods used to store more info **

    Sub DeserializeKey(ByVal theKey As String, theId As Variant)
        'This is where you'd do some stuff to
        'turn the key into individual cells, and store it.
        'I'm only writing to the debug widnow to demonstrate
        Dim output As String
        
        'Keep in mind that we have a 2d array, and we are reading
        'one row at a time. So get the number of columns in the
        'array, and then do whatever you need with them.
        For i = LBound(theId, 2) To UBound(theId, 2)
            output = output & " " & theId(1, i)
        Next i
        
        Debug.Print theKey & " -->" & output
    End Sub


    Function GetDictionary(inputRange As Range) As Object
        Dim oDict As Object
        Dim sht As Worksheet
        Dim cel As Range
        Dim theKey As String
        Dim oQueue As Object
        Dim columnCount As Long
        Dim rngAsArray As Variant
        
        Set sht = inputRange.Parent
        
        'Get the column count of the input range. Since we don't
        'hardcode it in, this function is more flexible to
        'future changes
        columnCount = inputRange.Columns.Count
        
        Set oDict = CreateObject("Scripting.Dictionary")
        
        For Each cel In Intersect(inputRange, sht.Columns(1))
            theKey = GenerateKey(cel.Resize(, 5))
            
            'Put the full row into an array, which we will then
            'store as the content of the queue
            rngAsArray = cel.Resize(, columnCount).Value
            
            If oDict.Exists(theKey) Then
                oDict(theKey).enqueue rngAsArray
            Else
                Set oQueue = CreateObject("System.Collections.Queue")
                oQueue.enqueue rngAsArray
                oDict.Add theKey, oQueue
            End If
        Next cel
        
        Set GetDictionary = oDict
    End Function

Upvotes: 1

Maciej Los
Maciej Los

Reputation: 8591

First of all, please read DS_London's comment.

If you would like to have a result sheet, then you can use below macro:

Option Explicit

Sub CompareData()
    Dim wbk As Workbook
    Dim wshMyData As Worksheet, wshBrokersData As Worksheet, wshResult As Worksheet
    Dim i As Integer, j As Integer, k As Integer
    Dim sTmp As String
    
    On Error Resume Next
    Set wbk = ThisWorkbook
    Set wshResult = wbk.Worksheets("Result")
    
    On Error GoTo Err_CompareData

    'if there_s no result sheet
    If Not wshResult Is Nothing Then
        Application.DisplayAlerts = False
        wbk.Worksheets("Result").Delete
        Application.DisplayAlerts = True
    End If
    
    Set wshMyData = wbk.Worksheets("Sheet1")
    Set wshBrokersData = wbk.Worksheets("Sheet2")
    Set wshResult = wbk.Worksheets.Add(After:=wshBrokersData)
    wshResult.Name = "Result"
    wshResult.Range("A1") = "ID"
    wshResult.Range("B1") = "Stock"
    wshResult.Range("C1") = "Qty"
    wshResult.Range("D1") = "Price"
    wshResult.Range("E1") = "Date"
    wshResult.Range("F1") = "My"
    wshResult.Range("G1") = "Broker"
    wshResult.Range("A1:G1").Interior.Color = vbGreen
    
    'find last entry in your data
    i = wshMyData.Range("A" & wshMyData.Rows.Count).End(xlUp).Row
    'find last entry in brokers data
    j = wshBrokersData.Range("A" & wshBrokersData.Rows.Count).End(xlUp).Row
    'copy data into result sheet
    k = 2
    wshMyData.Range("A2:E" & i).Copy wshResult.Range("A" & k)
    k = k + i - 1
    wshBrokersData.Range("A2:E" & j).Copy wshResult.Range("A" & k)
    k = k + j - 2
    'remove duplicates
    wshResult.Range("$A$1:$E$" & k).RemoveDuplicates Columns:=Array(2, 3, 4, 5), Header:=xlYes
    k = wshResult.Range("A" & wshResult.Rows.Count).End(xlUp).Row
    'start comparison ;)
    'my data
    sTmp = "(" & wshMyData.Name & "!" & wshMyData.Range("B1:B" & i).AddressLocal & "=B2)"
    sTmp = sTmp & "*(" & wshMyData.Name & "!" & wshMyData.Range("C1:C" & i).AddressLocal & "=C2)"
    sTmp = sTmp & "*(" & wshMyData.Name & "!" & wshMyData.Range("D1:D" & i).AddressLocal & "=D2)"
    sTmp = sTmp & "*(" & wshMyData.Name & "!" & wshMyData.Range("E1:E" & i).AddressLocal & "=E2)"
    sTmp = "=SUM(IF(" & sTmp & ", 1, 0))"
    wshResult.Range("F2").BorderAround LineStyle:=xlContinuous
    wshResult.Range("F2").FormulaArray = sTmp
    wshResult.Range("F2:F" & k).FillDown
    'brokres data
    sTmp = "(" & wshBrokersData.Name & "!" & wshBrokersData.Range("B1:B" & i).AddressLocal & "=B2)"
    sTmp = sTmp & "*(" & wshBrokersData.Name & "!" & wshBrokersData.Range("C1:C" & i).AddressLocal & "=C2)"
    sTmp = sTmp & "*(" & wshBrokersData.Name & "!" & wshBrokersData.Range("D1:D" & i).AddressLocal & "=D2)"
    sTmp = sTmp & "*(" & wshBrokersData.Name & "!" & wshBrokersData.Range("E1:E" & i).AddressLocal & "=E2)"
    sTmp = "=SUM(IF(" & sTmp & ", 1, 0))"
    wshResult.Range("G2").BorderAround LineStyle:=xlContinuous
    wshResult.Range("G2").FormulaArray = sTmp
    wshResult.Range("G2:G" & k).FillDown
    'autofit
    wshResult.Range("A:G").Columns.AutoFit
  

Exit_CompareData:
    On Error Resume Next
    Set wshMyData = Nothing
    Set wshBrokersData = Nothing
    Set wshResult = Nothing
    Set wbk = Nothing
    Exit Sub

Err_CompareData:
    MsgBox Err.Description, vbExclamation, Err.Number
    Resume Exit_CompareData

End Sub

Result:

result

As you can see, 0 means that there's no corresponding data in selected sheet.

What above macro does?

  1. Adds new sheet: Result, then adds column headers (ID, Stock, Qty, Price, Date, My data, Broker in row 1 respectively in columns A-G)
  2. Copies all data from your sheet (Sheet1) to Result sheet
  3. Copies all data from broker's sheet (Sheet2) to Result sheet
  4. Removes duplicates in Result sheet (based on all columns excluding ID)
  5. Inserts formula-array in cell F2 and G2 and fill it down.

Important note: There's at least few other ways to achieve that...

Final note: Feel free to change the code to your needs.

Upvotes: 1

Related Questions