Reputation: 15
Every morning I have to compare manually row by row the information in my database with the one sent by the broker. Normally they should have exactly the same information. The information is about executed trades.
1- I need to compare row by row looking at the values "Stock", "Qty", "Price" and "Date". If any row is not matched from each other (i.e. a value is erroneously different in one of the table or the entire row is missing), I need the unmatched rows to be printed in a third "OUTPUT" table.
My TABLE, BROKER'S TABLE & OUTPUT
2- The thing here is that there are duplicates like for "microsoft" or "nvidia" that are independent trades (different IDs). Duplicates must be kept in the comparison because they are different trades.
How could I manage the duplicates problems? Using Collections in Dictionary could help me? I would compare Table A to Table B and then Table B to Table A. Or the presence of duplicates (actually independent trades) makes it impossible to perform ?
My file has more than 500 rows.
Upvotes: 0
Views: 1471
Reputation: 5770
I'll add an entirely in-memory way to achieve the output.
The one thing worth considering is that we have no way of differentiating one ID from the next if they have matching stock, price, qty, and dates. So, the way I'm handling duplicates is in the order that they appear in the spreadsheet. So if there are two matching entries in your table, and only 1 in the broker table, I assume that the first entry in your table matches the first entry in the broker table, and therefore your second entry will be output.
Try working through the code. I took a few shortcuts for the sake of timing, but I also encapsulated each of the functions so that you can modify as needed. You'll just have to build out the DeserializeKey function to convert a key back to cells in the output sheet (shouldn't be too hard). See the below code, and let me know if it meets expectations.
Note: You will run the "CompareDictionaries" subroutine. The others are helper functions.
Sub CompareDictionaries()
Dim oMine As Object
Dim oBroker As Object
Dim myQueueCount As Long
Dim brokerQueueCount As Long
Dim minQueue As Long
Dim oMinQueue As Object
Set oMine = GetDictionary(Sheet1.Range("A2:E7"))
Set oBroker = GetDictionary(Sheet2.Range("A2:E6"))
For Each oKey In oMine.keys
'The whole row does not exist in the broker table
If Not oBroker.Exists(oKey) Then
Do While oMine(oKey).Count > 0
DeserializeKey oKey, oMine(oKey).dequeue
Loop
Else 'The keys exist in both tables
myQueueCount = oMine(oKey).Count
brokerQueueCount = oBroker(oKey).Count
If myQueueCount = brokerQueueCount Then
'Do nothing. They both have the same number of
'id's, and so we assume they are in sync.
Else
'Determine the minimum queue size, and get rid
'of that many values, since we won't need them
minQueue = IIf(myQueueCount < brokerQueueCount, myQueueCount, brokerQueueCount)
For i = 1 To minQueue
oMine(oKey).dequeue
oBroker(oKey).dequeue
Next i
'Take the remaining IDs out of the dictionary/queue that had more
If brokerQueueCount > myQueueCount Then
Set oMinQueue = oBroker
Else
Set oMinQueue = oMine
End If
Do While oMinQueue(oKey).Count > 0
DeserializeKey oKey, oMinQueue(oKey).dequeue
Loop
End If
End If
Next oKey
'The only remaining thing to test for is keys in the broker dict
'that are not in the myDict
For Each oKey In oBroker.keys
If Not oMine.Exists(oKey) Then
Do While oBroker(oKey).Count > 0
DeserializeKey oKey, oBroker(oKey).dequeue
Loop
End If
Next oKey
End Sub
Function GetDictionary(inputRange As Range) As Object
Dim oDict As Object
Dim sht As Worksheet
Dim cel As Range
Dim theKey As String
Dim oQueue As Object
Set sht = inputRange.Parent
Set oDict = CreateObject("Scripting.Dictionary")
For Each cel In Intersect(inputRange, sht.Columns(1))
theKey = GenerateKey(cel.Resize(, 5))
If oDict.Exists(theKey) Then
oDict(theKey).enqueue cel.Value
Else
Set oQueue = CreateObject("System.Collections.Queue")
oQueue.enqueue cel.Value
oDict.Add theKey, oQueue
End If
Next cel
Set GetDictionary = oDict
End Function
Sub DeserializeKey(ByVal theKey As String, theId As String)
'This is where you'd do some stuff to
'turn the key into individual cells, and store it.
'I'm only writing to the debug widnow to demonstrate
Debug.Print theId & " " & theKey
End Sub
Function GenerateKey(rng As Range) As String
GenerateKey = rng(2) & Format(rng(3), "0") _
& Format(rng(4), "0.00") & Format(rng(5), "mmddyyyy")
End Function
For those interested in this method, I'm editing this answer to add the "deserializekey" function:
Dim r As Long
Worksheets("Output").Activate
r = 1
'What we are doing here with "loopcell" is to check if the destination cells in the "output" sheet are empty or free.
'If not, we go down 1 row.
loopcell:
If IsEmpty(Range("A" & r).Value) = True Then
Range("A" & r).Value = "_" & theId & "_" & theKey
Else
r = r + 1
GoTo loopcell
End If
'The key is wrote to the the cell but we need to split every element of the key in multiple cells.
splitOutput = Range("A" & r).Value
splitArray = Split(splitOutput, "_")
For i = 1 To UBound(splitArray)
Cells(r, i).Value = splitArray(i)
Next i
Debug.Print theId & " " & theKey
End Sub ```
** New GetDictionary and Deserialize methods used to store more info **
Sub DeserializeKey(ByVal theKey As String, theId As Variant)
'This is where you'd do some stuff to
'turn the key into individual cells, and store it.
'I'm only writing to the debug widnow to demonstrate
Dim output As String
'Keep in mind that we have a 2d array, and we are reading
'one row at a time. So get the number of columns in the
'array, and then do whatever you need with them.
For i = LBound(theId, 2) To UBound(theId, 2)
output = output & " " & theId(1, i)
Next i
Debug.Print theKey & " -->" & output
End Sub
Function GetDictionary(inputRange As Range) As Object
Dim oDict As Object
Dim sht As Worksheet
Dim cel As Range
Dim theKey As String
Dim oQueue As Object
Dim columnCount As Long
Dim rngAsArray As Variant
Set sht = inputRange.Parent
'Get the column count of the input range. Since we don't
'hardcode it in, this function is more flexible to
'future changes
columnCount = inputRange.Columns.Count
Set oDict = CreateObject("Scripting.Dictionary")
For Each cel In Intersect(inputRange, sht.Columns(1))
theKey = GenerateKey(cel.Resize(, 5))
'Put the full row into an array, which we will then
'store as the content of the queue
rngAsArray = cel.Resize(, columnCount).Value
If oDict.Exists(theKey) Then
oDict(theKey).enqueue rngAsArray
Else
Set oQueue = CreateObject("System.Collections.Queue")
oQueue.enqueue rngAsArray
oDict.Add theKey, oQueue
End If
Next cel
Set GetDictionary = oDict
End Function
Upvotes: 1
Reputation: 8591
First of all, please read DS_London's comment.
If you would like to have a result sheet, then you can use below macro:
Option Explicit
Sub CompareData()
Dim wbk As Workbook
Dim wshMyData As Worksheet, wshBrokersData As Worksheet, wshResult As Worksheet
Dim i As Integer, j As Integer, k As Integer
Dim sTmp As String
On Error Resume Next
Set wbk = ThisWorkbook
Set wshResult = wbk.Worksheets("Result")
On Error GoTo Err_CompareData
'if there_s no result sheet
If Not wshResult Is Nothing Then
Application.DisplayAlerts = False
wbk.Worksheets("Result").Delete
Application.DisplayAlerts = True
End If
Set wshMyData = wbk.Worksheets("Sheet1")
Set wshBrokersData = wbk.Worksheets("Sheet2")
Set wshResult = wbk.Worksheets.Add(After:=wshBrokersData)
wshResult.Name = "Result"
wshResult.Range("A1") = "ID"
wshResult.Range("B1") = "Stock"
wshResult.Range("C1") = "Qty"
wshResult.Range("D1") = "Price"
wshResult.Range("E1") = "Date"
wshResult.Range("F1") = "My"
wshResult.Range("G1") = "Broker"
wshResult.Range("A1:G1").Interior.Color = vbGreen
'find last entry in your data
i = wshMyData.Range("A" & wshMyData.Rows.Count).End(xlUp).Row
'find last entry in brokers data
j = wshBrokersData.Range("A" & wshBrokersData.Rows.Count).End(xlUp).Row
'copy data into result sheet
k = 2
wshMyData.Range("A2:E" & i).Copy wshResult.Range("A" & k)
k = k + i - 1
wshBrokersData.Range("A2:E" & j).Copy wshResult.Range("A" & k)
k = k + j - 2
'remove duplicates
wshResult.Range("$A$1:$E$" & k).RemoveDuplicates Columns:=Array(2, 3, 4, 5), Header:=xlYes
k = wshResult.Range("A" & wshResult.Rows.Count).End(xlUp).Row
'start comparison ;)
'my data
sTmp = "(" & wshMyData.Name & "!" & wshMyData.Range("B1:B" & i).AddressLocal & "=B2)"
sTmp = sTmp & "*(" & wshMyData.Name & "!" & wshMyData.Range("C1:C" & i).AddressLocal & "=C2)"
sTmp = sTmp & "*(" & wshMyData.Name & "!" & wshMyData.Range("D1:D" & i).AddressLocal & "=D2)"
sTmp = sTmp & "*(" & wshMyData.Name & "!" & wshMyData.Range("E1:E" & i).AddressLocal & "=E2)"
sTmp = "=SUM(IF(" & sTmp & ", 1, 0))"
wshResult.Range("F2").BorderAround LineStyle:=xlContinuous
wshResult.Range("F2").FormulaArray = sTmp
wshResult.Range("F2:F" & k).FillDown
'brokres data
sTmp = "(" & wshBrokersData.Name & "!" & wshBrokersData.Range("B1:B" & i).AddressLocal & "=B2)"
sTmp = sTmp & "*(" & wshBrokersData.Name & "!" & wshBrokersData.Range("C1:C" & i).AddressLocal & "=C2)"
sTmp = sTmp & "*(" & wshBrokersData.Name & "!" & wshBrokersData.Range("D1:D" & i).AddressLocal & "=D2)"
sTmp = sTmp & "*(" & wshBrokersData.Name & "!" & wshBrokersData.Range("E1:E" & i).AddressLocal & "=E2)"
sTmp = "=SUM(IF(" & sTmp & ", 1, 0))"
wshResult.Range("G2").BorderAround LineStyle:=xlContinuous
wshResult.Range("G2").FormulaArray = sTmp
wshResult.Range("G2:G" & k).FillDown
'autofit
wshResult.Range("A:G").Columns.AutoFit
Exit_CompareData:
On Error Resume Next
Set wshMyData = Nothing
Set wshBrokersData = Nothing
Set wshResult = Nothing
Set wbk = Nothing
Exit Sub
Err_CompareData:
MsgBox Err.Description, vbExclamation, Err.Number
Resume Exit_CompareData
End Sub
Result:
As you can see, 0
means that there's no corresponding data in selected sheet.
What above macro does?
Result
, then adds column headers (ID
, Stock
, Qty
, Price
, Date
, My data
, Broker
in row 1 respectively in columns A
-G
)Sheet1
) to Result
sheetSheet2
) to Result
sheetResult
sheet (based on all columns excluding ID
)F2
and G2
and fill it down.Important note: There's at least few other ways to achieve that...
Final note: Feel free to change the code to your needs.
Upvotes: 1