Reputation: 23
I need some help speeding up the current code I am running.
To start, I have a large data
sheet That has roughly 180,000 lines, and a unique
sheet that has only the unique values from that large list that is roughly 9000 lines so it will currently take too long to make this code feasible. The Current i
and j
values are just place holders to test if the code was working.
I had the idea to create a collection to store the data so that once it was matched, it could then be deleted from the collection so it was not needed to be checked again later for another value in the uniqueArray()
.
Is a collection possible since i need to check for 3 conditions before adding the value of the 4th cell?
I really appreciate any help or advice since i have really only been programming in VBA for a few weeks here and there.
Sub getHours(uniqueArray() As Variant, Lastrow As Integer)
Dim i As Integer, lastData As Long
Dim tempTerms As Integer
Dim OpenForms
Sheets("Data").Select
lastData = Range("A2").End(xlDown).Row
For i = 1 To Lastrow
uniqueArray(i, 2) = 0
Next i
i = 0
For i = 1 To 10 'Lastrow
tempTerms = 0
tempProj = uniqueArray(i, 1)
If i Mod 30 = 0 Then
openform = DoEvents
End If
For j = 2 To 10000 'lastData
If tempProj = Cells(j, 10).Value _
And Cells(j, 5).Value = 55 Then
tempTerms = tempTerms + Cells(j, 8).Value
End If
Next j
uniqueArray(i, 2) = tempTerms
Application.StatusBar = i
Next i
End Sub
Upvotes: 2
Views: 195
Reputation: 3324
Load 180K rows into array, MUST sort 180K array, then binary search against that sorted array.
Use memo of matched row for each iteration of outer loop, then stop testing for conditions on inner loop once matching is finished. Go easy on the interface updates.
Doevents on each outer iteration is enuff to get by. Just a dump of adequate functions below:
Option Explicit
Sub getHours()
Dim arr1 As Variant, arr2 As Variant
arr1 = Sheet1.Range("A2:B9001").Value2
arr2 = Sheet2.Range("A2:J180001").Value2 'whatever your range is
QuickSort1 arr2, 10 'sorting data on column 10 as you had it.
Dim i As Long, j As Long, tempSum As Long
For i = 1 To UBound(arr1)
tempSum = 0
Dim retArr As Variant
retArr = wsArrayBinaryLookup(arr1(i, 1), arr2, 10, 10, False)
If Not IsError(retArr(0)) Then
If arr1(i, 1) = retArr(0) Then
Dim matchRow As Long
matchRow = retArr(1)
'Go through from matched row till stop matching
Do
If arr2(matchRow, 10) <> arr1(i, 1) Then Exit Do
If arr2(matchRow, 5) = 55 Then
tempSum = tempSum + arr2(matchRow, 8)
End If
matchRow = matchRow + 1
Loop While matchRow <= UBound(arr2)
End If
End If
arr1(i, 2) = tempSum
DoEvents
Next i
Sheet1.Range("A2:B9001").Value2 = arr1
End Sub
Public Sub QuickSort1( _
ByRef pvarArray As Variant, _
ByVal colToSortBy, _
Optional ByVal plngLeft As Long, _
Optional ByVal plngRight As Long)
Dim lngFirst As Long
Dim lngLast As Long
Dim varMid As Variant
Dim varSwap As Variant
If plngRight = 0 Then
plngLeft = LBound(pvarArray)
plngRight = UBound(pvarArray)
End If
lngFirst = plngLeft
lngLast = plngRight
varMid = pvarArray((plngLeft + plngRight) \ 2, colToSortBy)
Do
Do While pvarArray(lngFirst, colToSortBy) < varMid And lngFirst < plngRight
lngFirst = lngFirst + 1
Loop
Do While varMid < pvarArray(lngLast, colToSortBy) And lngLast > plngLeft
lngLast = lngLast - 1
Loop
Dim arrColumn As Long
If lngFirst <= lngLast Then
For arrColumn = 1 To UBound(pvarArray, 2)
varSwap = pvarArray(lngFirst, arrColumn)
pvarArray(lngFirst, arrColumn) = pvarArray(lngLast, arrColumn)
pvarArray(lngLast, arrColumn) = varSwap
Next arrColumn
lngFirst = lngFirst + 1
lngLast = lngLast - 1
End If
Loop Until lngFirst > lngLast
If plngLeft < lngLast Then QuickSort1 pvarArray, colToSortBy, plngLeft, lngLast
If lngFirst < plngRight Then QuickSort1 pvarArray, colToSortBy, lngFirst, plngRight
End Sub
Public Function wsArrayBinaryLookup( _
ByVal val As Variant, _
arr As Variant, _
ByVal searchCol As Long, _
ByVal returnCol As Long, _
Optional exactMatch As Boolean = True) As Variant
Dim a As Long, z As Long, curr As Long
Dim retArr(0 To 1) As Variant
retArr(0) = CVErr(xlErrNA)
retArr(1) = 0
wsArrayBinaryLookup = retArr
a = LBound(arr)
z = UBound(arr)
If compare(arr(a, searchCol), val) = 1 Then
Exit Function
End If
If compare(arr(a, searchCol), val) = 0 Then
retArr(0) = arr(a, returnCol)
retArr(1) = a
wsArrayBinaryLookup = retArr
Exit Function
End If
If compare(arr(z, searchCol), val) = -1 Then
Exit Function
End If
While z - a > 1
curr = Round((CLng(a) + CLng(z)) / 2, 0)
If compare(arr(curr, searchCol), val) = 0 Then
z = curr
retArr(0) = arr(curr, returnCol)
retArr(1) = curr
wsArrayBinaryLookup = retArr
End If
If compare(arr(curr, searchCol), val) = -1 Then
a = curr
Else
z = curr
End If
Wend
If compare(arr(z, searchCol), val) = 0 Then
retArr(0) = arr(z, returnCol)
retArr(1) = z
wsArrayBinaryLookup = retArr
Else
If Not exactMatch Then
retArr(0) = arr(a, returnCol)
retArr(1) = a
wsArrayBinaryLookup = retArr
End If
End If
End Function
Public Function compare(ByVal x As Variant, ByVal y As Variant) As Long
If IsNumeric(x) And IsNumeric(y) Then
Select Case x - y
Case Is = 0
compare = 0
Case Is > 0
compare = 1
Case Is < 0
compare = -1
End Select
Else
If TypeName(x) = "String" And TypeName(y) = "String" Then
compare = StrComp(x, y, vbTextCompare)
End If
End If
End Function
Upvotes: 0
Reputation: 71187
Sub getHours(uniqueArray() As Variant, Lastrow As Integer)
The procedure is implicily Public
, and the parameters are implicitly passed ByRef
. As a maintainer I would expect a method named getHours
to get me the "hours", whatever that is - but a Sub
procedure doesn't return anything to its callers, like a Function
does. Hence the name is misleading. Procedures do something, they need a descriptive name that says what it does, ...and then the code needs to do what the name says.
Consistency is also important: you have a camelCase
public procedure name, and then mixed-up camelCase
and PascalCase
parameter names. Stick to PascalCase
for module members, and use camelCase
for locals/parameters. Or whatever - just be consistent about it.
LastRow
being an Integer
raises a flag. Integer
is a 16-bit signed integer type, making its maximum value 32,767, which will cause problems when you try to assign it to 32,768 or higher. Use a Long
instead - a 32-bit signed integer type much more appropriate for general-purpose integer values - especially for something like a "row number", which can be well above 100K in Excel.
Dim i As Integer, lastData As Long
i
should be a Long
, and lastData
is assigned, but never referred to - remove it and its assignment. Speaking of which...
Sheets("Data").Select
lastData = Range("A2").End(xlDown).Row
Don't .Select
worksheets. Use a Worksheet
object instead:
Dim dataSheet As Worksheet
Set dataSheet = ThisWorkbook.Worksheets("Data")
Note that Range
, unqualified with a Worksheet
object, implicitly refers to whatever worksheet is active, in whatever workbook is active. Unless you're in a worksheet module's code-behind - in which case it refers to that worksheet. If you mean to do that, be explicit and do Me.Range
instead. If not, then properly qualify Range
and Cells
calls with a Worksheet
object.
Then use it:
lastData = dataSheet.Range("A2").End(xlDown).Row
Some more integers:
Dim tempTerms As Integer
Again, no reason to use a 16-bit integer type, declare As Long
.
Dim OpenForms
What the heck does this procedure needs to know the number of open forms for? It doesn't. Remove it.
openform = DoEvents
You're assigning to openform
, but you declared OpenForms
. If your code compiles and runs, it means you haven't specified Option Explicit
at the top of the module. Do it. That will prevent VBA from happily compiling typos, and will force you to declare every variable you use. Here OpenForms
is left unused, and openform
is an undeclared Variant
declared on-the-fly by the VBA run-time.
To be honest I didn't even know DoEvents
returned anything - it returning the number of open forms strikes me as a giant WTF. Anyway, here's how I've always seen it used:
DoEvents
That's all! Yes, this discards the returned value. But who cares about the number of opened forms in the first place?
tempProj
isn't declared. Declare it. j
isn't declared. Declare it.
Reading a cell's value is dangerous. Cells contain a Variant
, so whenever you read a cell's value into a String
or Long
or whatever typed variable, you are making VBA perform an implicit type conversion - a conversion that isn't always possible.
This will eventually break - or come back and bite you in this or another project:
If tempProj = Cells(j, 10).Value _ And Cells(j, 5).Value = 55 Then tempTerms = tempTerms + Cells(j, 8).Value End If
You need to be sure the cell doesn't contain an error value before you can do that.
If IsError(Cells(j, 10).Value) Or IsError(Cells(j, 5).Value) Or IsError(Cells(j, 8).Value) Then
MsgBox "Row " & j & " contains an error value in column 5, 8, or 10."
Exit Sub
End If
Variant
when a better type exists.Variant
. Use Option Explicit
.Select
and Activate
.DoEvents
.Read the worksheet's data into a variant array:
Dim dataSheet As Worksheet
Set dataSheet = ThisWorkbook.Worksheets("Data")
Dim sheetData As Variant
sheetData = dataSheet.Range("A1:J" & lastData).Value
Now sheetData
is a 2D array that contains every single value in the specified range - all copied in-memory in a split-second.
So the j
loop becomes something like this1:
Dim j As Long
For j = 2 To lastData
If tempProj = sheetData(j, 10) And sheetData(j, 5) = 55 Then
tempTerms = tempTerms + sheetData(j, 8)
End If
Next j
Now I see what you're doing. uniqueArray
is your return value! Hard to tell by just looking at the method's signature - naming it result
or better, outHoursPerTerm
, would go a long way into making the code easier to understand at a glance.
Consider setting the Application.Cursor
to a hourglass and set it back to default once done - possibly also setting the status bar to "Please wait..." or something similar. If the thing takes longer than 5-8 seconds, then consider updating the status bar for every couple iterations of the outer loop, but note that doing that will make the procedure considerably slower.
Toggling calculation, worksheet events, screen updating and whatnot, isn't going to help here - you're not writing anywhere, only reading. Work off an in-memory 2D array and you should see considerable performance improvements.
This answer purposely reads like a Code Review answer. Questions about improving working code (performance, readability, etc.) are usually a better fit on CR. Consider asking on CR next time you need help improving your working code - as you can see a CR answer covers much more ground than a typical SO answer.
1Not tested, written in the answer box. Might need to transpose rows into columns.
Upvotes: 1
Reputation: 43585
This is what I usually use for speeding:
Public Sub OnEnd()
Application.ScreenUpdating = True
Application.EnableEvents = True
Application.AskToUpdateLinks = True
Application.DisplayAlerts = True
Application.Calculation = xlAutomatic
ThisWorkbook.Date1904 = False
Application.StatusBar = False
End Sub
Public Sub OnStart()
Application.ScreenUpdating = False
Application.EnableEvents = False
Application.AskToUpdateLinks = False
Application.DisplayAlerts = False
Application.Calculation = xlAutomatic
ThisWorkbook.Date1904 = False
ActiveWindow.View = xlNormalView
End Sub
Sub getHours(uniqueArray() As Variant, Lastrow As Integer)
Dim i As Integer, lastData As Long
Dim tempTerms As Integer
Dim OpenForms
call OnStart
code ...
Next i
call OnEnd
End Sub
The ScreenUpdating = False
does about 90% of the job, the rest is there just to make sure it runs as expected.
Edit:
Theoretically, if you change Dim tempTerms As Integer
to Long
it should be faster. And probably it is better to define OpenForms
as something.
Upvotes: 0