Reputation: 45
I'm writing some code to reformat some datasets. At the end, I have some empty columns in my output dataset, which I need to populate by looking up values from another sheet (Index-Match effectively).
I have tried many approaches, and two were successful. The problem is, they are far too slow, and unfortunately I am limited to using VBA. To prefix, I'm aware this problem could be solved by simply writing the Index-Match formula into the cells manually at the end and dragging them down, but I am trying to make life as simple as possible for the people who will need to perform this task monthly (also to reduce room for error).
Background info:
ws_ps example Column B as lookup value, with Column E to populate if a row in Column B is not empty
ws_priips example Column G corresponds to the lookup value in ws_ps Column B, with the value to look for coming from Column E
Approach 1: Looping through each row and using WorksheetFunction - 12 min runtime
On Error Resume Next
last_row_ps = ws_ps.UsedRange.Rows.Count
For ps_row = 2 To last_row_ps
ws_ps.Cells(ps_row, 5).Value = WorksheetFunction.IfError(WorksheetFunction.Index(ws_priips.Range("A:G"), _
WorksheetFunction.Match(ws_ps.Cells(ps_row, 2), ws_priips.Range("G:G"), 0), 5), "")
Next ps_row
On Error GoTo -1
Approach 2: Loading worksheets into arrays and writing to worksheet if conditions are met - 15min runtime
last_row_ps = ws_ps.UsedRange.Rows.Count
last_row_priips = ws_priips.UsedRange.Rows.Count
ps_array = ws_ps.Range("A1:X" & last_row_ps).Value
priips_array = ws_priips.Range("A1:AZ" & last_row_priips).Value
For ps_row = 2 To UBound(ps_array, 1)
For priips_row = 2 To UBound(priips_array, 1)
If ps_array(ps_row, 2) <> "" Then
If ps_array(ps_row, 2) = priips_array(priips_row, 7) Then
ws_ps.Cells(ps_row, 5).Value = priips_array(priips_row, 5)
GoTo SkipLoop
End If
Else
GoTo SkipLoop
End If
Next priips_row
SkipLoop:
Next ps_row
Am I out of luck? So far I've only implemented these solutions for 1 column, but I will need to apply it to around 10. Is there any way to drastically speed things up, without resorting to typing the functions into the worksheet or using python? I would be grateful for any pointers. I am still very much a beginner.
Upvotes: 4
Views: 183
Reputation: 4528
Another way which can help is the concurrent loop approach (no external library, suitable for Mac):
Option Explicit
Sub Way2()
Dim arrDest, arrSrc, ws As Worksheet, rngDest As Range, d&, s&, r&
Application.ScreenUpdating = False
Set ws = ThisWorkbook.ActiveSheet
Set rngDest = ws.Range("A1").CurrentRegion
r = rngDest.Rows.Count
[B2].Resize(r - 1, 1) = Application.Evaluate("SEQUENCE(" & (r - 1) & ")")
rngDest.Sort Key1:=[A:A], Header:=xlYes
arrDest = rngDest.Value
arrSrc = ws.Range("I1").CurrentRegion.Value
s = 2
For d = 2 To r
While arrDest(d, 1) <> arrSrc(s, 1)
s = s + 1
Wend
arrDest(d, 3) = arrSrc(s, 2)
arrDest(d, 5) = arrSrc(s, 3)
Next
rngDest.Value = arrDest
rngDest.Sort Key1:=[B:B], Header:=xlYes
[B2].Resize(r - 1, 1).ClearContents
Application.ScreenUpdating = True
End Sub
I tested it on the 50000 reference table and the 300000 data table. Despite additional operations, this method is only 15% slower than using a dictionary. It's expected that the reference table is ordered (which is typical), otherwise it can be temporary sorted same as the data table.
This method can be helpful if a dictionary is not applicable, e. g. if range comparison is required.
Limitation: the data table should contain only eligible keys. But it can be resolved.
To measure the execution time, use the next addition for a sub:
Public Declare PtrSafe Function GetTickCount Lib "kernel32.dll" () As Long
Sub Anysub()
Dim t As Long
t = GetTickCount
' payload code here
Debug.Print GetTickCount - t
End Sub
The subroutine for the data table without key integrity:
Sub Way2_NoIntegrity()
Dim arrDest, arrSrc, ws As Worksheet, d&, s&, r&, m&, p&, rngDest As Range
Application.ScreenUpdating = False
Set ws = ThisWorkbook.ActiveSheet
Set rngDest = ws.Range("A1").CurrentRegion
r = rngDest.Rows.Count
[B2].Resize(r - 1, 1) = Application.Evaluate("SEQUENCE(" & (r - 1) & ")")
rngDest.Sort Key1:=[A:A], Header:=xlYes
arrDest = rngDest.Value
arrSrc = ws.Range("I1").CurrentRegion.Value
p = UBound(arrSrc, 1)
s = 2: m = 2
For d = 2 To r
While arrDest(d, 1) <> arrSrc(s, 1)
s = s + 1
If s > p Then
d = d + 1
If d > r Then Exit For
s = m
End If
Wend
m = s
arrDest(d, 3) = arrSrc(s, 2)
arrDest(d, 5) = arrSrc(s, 3)
Next
rngDest.Value = arrDest
rngDest.Sort Key1:=[B:B], Header:=xlYes
[B2].Resize(r - 1, 1).ClearContents
Application.ScreenUpdating = True
End Sub
The subroutine for range comparison:
Sub Way2RangeComp()
Dim arrDest, arrSrc, ws As Worksheet, d&, s&, r&, m&, p&, rngDest As Range
Application.ScreenUpdating = False
Set ws = ThisWorkbook.ActiveSheet
Set rngDest = ws.Range("A1").CurrentRegion
r = rngDest.Rows.Count
[B2].Resize(r - 1, 1) = Application.Evaluate("SEQUENCE(" & (r - 1) & ")")
rngDest.Sort Key1:=[A:A], Header:=xlYes
arrDest = rngDest.Value
arrSrc = ws.[I1].CurrentRegion.Value
p = UBound(arrSrc, 1)
s = 2: m = 2
For d = 2 To r
While arrDest(d, 1) > arrSrc(s, 1)
s = s + 1
If s > p Then
d = d + 1
If d > r Then Exit For
s = m
End If
Wend
m = s
arrDest(d, 3) = arrSrc(s, 2)
arrDest(d, 5) = arrSrc(s, 3)
Next
rngDest.Value = arrDest
rngDest.Sort Key1:=[B:B], Header:=xlYes
[B2].Resize(r - 1, 1).ClearContents
Application.ScreenUpdating = True
End Sub
More efficient (20% better then with a dictionary!) algorithm for the data table without key integrity (of course, the case with the full key integrity is included):
Sub Way2_NoIntegrity_Advanced()
Dim arrDest, arrSrc, arrInd, ws As Worksheet, rngDest As Range
Dim d&, s&, r&, m&, p&, ind&
Application.ScreenUpdating = False
Set ws = ThisWorkbook.ActiveSheet : Set rngDest = Range(ws.[E2], ws.[A2].End(xlDown))
arrDest = rngDest.Value : r = UBound(arrDest, 1)
arrInd = Application.SortBy(Application.Evaluate("SEQUENCE(" & r & ")"), _
rngDest.Columns(1).Value)
arrSrc = ws.Range("I1").CurrentRegion.Value : p = UBound(arrSrc, 1) : s = 2: m = 2
For d = 1 To r
ind = arrInd(d, 1)
While arrDest(ind, 1) <> arrSrc(s, 1)
s = s + 1
If s > p Then
d = d + 1: ind = arrInd(d, 1)
If d > r Then Exit For
s = m
End If
Wend
m = s : arrDest(ind, 3) = arrSrc(s, 2) : arrDest(ind, 5) = arrSrc(s, 3)
Next
rngDest.Value = arrDest
Application.ScreenUpdating = True
End Sub
Upvotes: 0
Reputation: 166306
Read the data from the lookup sheet into a 2D array, then build a dictionary object with the ISIN value as the key, and the array "row" number as the value. Use the dictionary to extract matched row values from the array to populate to the sheet being filled. Faster if you first pull the "to be filled" and ISIN columns from the output dataset sheet, fill them using the dictionary and the lookup array, then at the end write them back to the output sheet. Basically try to avoid any cell-by-cell operations.
Using the code below, with both tables containing 150k rows (second table sorted randomly), I filled two columns in the first table in 3-4 sec.
Note if you need cross-platform support you can use a Collection in place of the dictionary, which is not available on Mac.
Sub Tester()
Dim arrDest, arrSrc, ws As Worksheet, isin As String
Dim dict As Object, r As Long, rMatch As Long
Dim rngDest As Range
'both my tables are on one sheet for testing
Set ws = ThisWorkbook.Worksheets("Sheet1")
'table to be filled
Set rngDest = ws.Range("A1").CurrentRegion
arrDest = rngDest.Value 'read to 2D array
'data used as lookup table, as an array
arrSrc = ws.Range("I1").CurrentRegion.Value
Set dict = CreateObject("scripting.dictionary")
'map isin to row number in lookup table array
For r = 2 To UBound(arrSrc, 1)
isin = arrSrc(r, 1)
If Len(isin) > 0 Then dict(isin) = r 'assumes no duplicates in lookup table
Next r
'loop the report table and try to match values in the lookup table
For r = 2 To UBound(arrDest, 1)
isin = arrDest(r, 1)
If Len(isin) > 0 Then
If dict.Exists(isin) Then 'have a match?
rMatch = dict(isin) 'matched row
arrDest(r, 3) = arrSrc(rMatch, 2) 'copy a couple of values
arrDest(r, 5) = arrSrc(rMatch, 3)
End If 'have match
End If 'not zero-length
Next r
rngDest.Value = arrDest 'replace data with updated array
End Sub
My test tables (shorter version):
Added: for anyone who needs cross-platform support (Windows/Mac) here's the basic "row map" approach using a Collection in place of a Dictionary. Loading the collection with 150k items takes about 10% longer than loading a dictionary of the same size, but retrieval by key is actually about 5x faster using a collection. Also seems like the Collection performance scales better for larger numbers of keys - for 400k items load and read is approx. 2.2 and 0.7 sec but for dictionary it's approx. 10.5 and 10 sec (much slower)
'Using a Collection like a dictionary for mapping column values
' to their position in a dataset
Sub CollectionTest()
Dim arr, r As Long, col As Collection, t, k As String, v
Set col = New Collection
'Source range A1:A150000 filled with "Val_000001", "Val_000002", etc
' sorted randomly
arr = Range("A1").CurrentRegion.Value
t = Timer
For r = 1 To UBound(arr, 1)
k = arr(r, 1)
'Add the row number as value and cell content as key
' note your row keys should be unique
If IsEmpty(KeyValue(col, k)) Then col.Add r, k
Next r
Debug.Print col.Count & " items"
Debug.Print "Loaded row map in " & Timer - t
t = Timer
For r = 1 To 150000
k = "Val_" & Format(r, "000000")
v = KeyValue(col, k)
If r < 5 Then Debug.Print "Key " & k & " at row# " & v
Next r
Debug.Print "Retrieved values in " & Timer - t
End Sub
'Retrieve value for key `k` from collection `col`
' Returns Empty if there's no such key
Function KeyValue(col As Collection, k As String)
On Error Resume Next 'ignore error if no match for `k`
KeyValue = col.Item(k)
End Function
Upvotes: 6
Reputation: 7759
Dictionary are the best choose for matching key values.
Read: Analyst Cave - Excel VLOOKUP vs INDEX MATCH vs SQL vs VBA
Watch: Excel VBA Introduction Part 39 - Dictionaries
Sub UpdateWSPD(ws_ps As Worksheet, ws_priips As Worksheet)
Const priips_IDColumn As Long = 7
Const ps_IDColumn As Long = 2
Const priips_ValueColumn As Long = 5
Const ps_ValueColumn As Long = 5
Const KeyPrefix As String = "Key"
Dim psData As Variant, priipsData As Variant
psData = ws_ps.Range("A1").CurrentRegion.Value
priipsData = ws_priips.Range("A1").CurrentRegion.Value
Dim Map As Object
Set Map = CreateObject("Scripting.Dictionary")
Dim Key As String
Dim r As Long
For r = 2 To UBound(priipsData)
Key = KeyPrefix & priipsData(r, priips_IDColumn)
Map(Key) = r
Next
Dim priipsRow As Long
For r = 2 To UBound(ws_ps)
Key = KeyPrefix & ws_ps(r, ps_IDColumn)
If Map.Exists(Key) Then
priipsRow = Map(Key)
ws_ps(r, ps_ValueColumn) = priipsData(priipsRow, priips_ValueColumn)
End If
Next
Application.ScreenUpdating = False
ws_ps.Range("A1").CurrentRegion.Value = ws_ps
Application.ScreenUpdating = True
End Sub
Upvotes: 4
Reputation: 5848
Is Power Query an option? Untested so you might need to adjust a thing or two:
let
// Load 'ws_ps' and 'ws_priips' queries
ps_query = Excel.CurrentWorkbook(){[Name="ws_ps"]}[Content],
priips_query = Excel.CurrentWorkbook(){[Name="ws_priips"]}[Content],
// Merge 'ps_query' with 'priips_query' using Column B and Column G
merged_table = Table.NestedJoin(ps_query, "Column B", priips_query, "Column G", "priips_data", JoinKind.LeftOuter),
// Expand the 5th column (E) from the merged data
expanded_table = Table.ExpandTableColumn(merged_table, "priips_data", {"Column E"}),
// Replace null values with an empty string
result_table = Table.ReplaceValue(expanded_table, null, "", Replacer.ReplaceValue, {"Column E"})
in
result_table
Upvotes: 1