Reputation:
My worksheet requires the following array formula in BG2.
=INDEX('Client'!O$2:O$347473,
MATCH(1, (('Client_Cost'!D$2:D$347473='Client'!BC2)*
('Client_Cost'!E$2:E$347473='Client'!BE2)), 0))
This provides a two-column match (Client_Cost!D:D to Client!BC2 AND Client_Cost!E:E to Client!BE2) and returns the corresponding value from Client!O:O.
The large number of rows makes the array formula very calculation-intensive. I can deal with a few hundred rows (~90 seconds for 500 rows) but I need results all the way down to Client'!BG347473 and I would like them sometime this year.
I've tried using Application Evaluate to return a result from the array formula into an variant array and subsequently returning the array of results to the worksheet en masse but it isn't the improvement I was hoping for. Looking for alternatives.
Upvotes: 4
Views: 649
Reputation:
First off, I would recommend developing alternative methods with a smaller data set. 5K or 10K rows will either demonstrate a noticeable improvement or not; you can always expand to the original data set once you are confident you are not going to get into a long 'Not responding' state that you have to crash out of.
A common method of removing the array from that style of array formula¹ is a 'helper' column that concatenates the two values from column D and E in the Client_Cost worksheet into a single, delimited value. For example, in Client_Cost!Z2 as,
=CONCATENATE(Client_Cost!D2, "|", Client_Cost!E2)
Filled down to Client_Cost!Z347473 should only take a second or two.
Once that is set up, a single INDEX/MATCH function pair can provide a vastly more efficient lookup on a similarly concatenated Client!BC2 and Client'!BE2. In Client!BG2 as,
=INDEX(Client!O$2:O$347473,
MATCH(CONCATENATE(Client!BC2, "|", Client!BE2),
Client_Cost'!Z$2:Z$347473, 0))
That will take 1 hr, 51 minutes for 350K rows. While not yet optimal, it is a big improvement over the estimated ~17.5 hours that the original took.
The next logical step in optimizing that method would be working with a VBA Scripting.Dictionary object. A dictionary holds its own unique index on its keys and the concatenated values could be stuffed into a dictionary object to facilitate virtually instantaneous lookups on a large number of items (i.e. rows).
Sub JR_CSE_in_Array()
Dim olr As Long, rws As Long, JR_Count As Long, JR_Values As Variant
Dim v As Long, vTMP As Variant, vTMPs As Variant, dVALs As Object
Debug.Print Timer
Set dVALs = CreateObject("Scripting.Dictionary")
'get some dimensions to the various data ranges
With Worksheets("Client_Cost")
'only use as many rows as absolutely necessary
olr = Application.Min(.Cells(Rows.Count, "D").End(xlUp).Row, _
.Cells(Rows.Count, "E").End(xlUp).Row)
'store D & E
vTMPs = .Range(.Cells(2, 4), .Cells(olr, 5)).Value2
End With
With Worksheets("Client")
rws = Application.Min(.Cells(Rows.Count, "BC").End(xlUp).Row, _
.Cells(Rows.Count, "BE").End(xlUp).Row, _
UBound(vTMPs, 1))
'override the above statement for sampling
'rws = 5000
'building the Dictionary object takes a fair bit of time but it is worth it
vTMP = .Range(.Cells(2, 15), .Cells(olr, 15)).Value2
For v = LBound(vTMPs, 1) To UBound(vTMPs, 1)
If Not dVALs.Exists(Join(Array(vTMPs(v, 1), vTMPs(v, 2)), ChrW(8203))) Then _
dVALs.Add Key:=Join(Array(vTMPs(v, 1), vTMPs(v, 2)), ChrW(8203)), Item:=vTMP(v, 1)
Next v
'store BC and BE
vTMPs = .Range(.Cells(2, 55), .Cells(olr, 57)).Value2
End With
ReDim JR_Values(1 To rws, 1 To 1) 'force a two-dimension, one-based index on the array
'Debug.Print LBound(JR_Values) & ":" & UBound(JR_Values)
For JR_Count = LBound(JR_Values, 1) To UBound(JR_Values, 1) Step 1
If dVALs.Exists(Join(Array(vTMPs(JR_Count, 1), vTMPs(JR_Count, 3)), ChrW(8203))) Then
JR_Values(JR_Count, 1) = dVALs.Item(Join(Array(vTMPs(JR_Count, 1), vTMPs(JR_Count, 3)), ChrW(8203)))
End If
Next JR_Count
With Worksheets("Client")
.Range("BG2").Resize(UBound(JR_Values), 1) = JR_Values
End With
'Debug.Print dVALs.Count
dVALs.RemoveAll: Set dVALs = Nothing
Debug.Print Timer
End Sub
The elapsed time for that routine to run (without helper column(s)) was 45.72 seconds. Breaking it down, it took a full 13.4 seconds just to build the dictionary and the remainder was largely taken up by the actual lookup with a half-second here and there attributed to the bulk seeding of the variant arrays from the worksheets' values.
So the Scripting.Dictionary is the clear winner here. Unfortunately, it is not automatically calculating updates in the various columns when values change but at this stage of development, the worksheet should be set to manual calculation. Setting one of the formula-based solutions into a recalculation event from a single retyped value seems an inefficient expenditure of time.
All-in-all, this makes perfect sense. The original array formula is analogous to an SQL SELECT statement with an INNER JOIN on two fields and if my SELECT statement was running inefficiently the first thing I would do to improve it would be to look at the tables' indexes.
On a related note, any workbook with this much data should be saved as a Excel Binary Workbook regardless of whether it is macro-enabled or not. The file size of a binary workbook (.XLSB) is typically ¹⁄₃ the size of an equivalent .XLSX or .XLSM. Beyond a faster initial load time, many bulk operations should prove faster.
Anyone wishing to test their own optimizations can find my sample .XLSB workbook here for the time being. Don't blindly run the procedures without seeing what you're getting into first.
¹ Array formulas need to be finalized with Ctrl+Shift+Enter↵. Once entered into the first cell correctly, they can be filled or copied down or right just like any other formula. Try and reduce your full-column references to ranges more closely representing the extents of your actual data. Array formulas chew up calculation cycles logarithmically so it is good practise to narrow the referenced ranges to a minimum. See Guidelines and examples of array formulas for more information.
Upvotes: 3