Blake Lindeman
Blake Lindeman

Reputation: 23

Speeding up array search, Possibly 2D Collection if that is possible?

I need some help speeding up the current code I am running.

To start, I have a large data sheet That has roughly 180,000 lines, and a unique sheet that has only the unique values from that large list that is roughly 9000 lines so it will currently take too long to make this code feasible. The Current i and j values are just place holders to test if the code was working.

I had the idea to create a collection to store the data so that once it was matched, it could then be deleted from the collection so it was not needed to be checked again later for another value in the uniqueArray().

Is a collection possible since i need to check for 3 conditions before adding the value of the 4th cell?

I really appreciate any help or advice since i have really only been programming in VBA for a few weeks here and there.

Sub getHours(uniqueArray() As Variant, Lastrow As Integer)
    Dim i As Integer, lastData As Long
    Dim tempTerms As Integer
    Dim OpenForms

    Sheets("Data").Select
    lastData = Range("A2").End(xlDown).Row

    For i = 1 To Lastrow
        uniqueArray(i, 2) = 0
    Next i
    i = 0

    For i = 1 To 10 'Lastrow

        tempTerms = 0
        tempProj = uniqueArray(i, 1)

        If i Mod 30 = 0 Then
            openform = DoEvents
        End If

        For j = 2 To 10000  'lastData
            If tempProj = Cells(j, 10).Value _
            And Cells(j, 5).Value = 55 Then
                tempTerms = tempTerms + Cells(j, 8).Value
            End If
        Next j

    uniqueArray(i, 2) = tempTerms
    Application.StatusBar = i

    Next i


End Sub

Upvotes: 2

Views: 195

Answers (3)

MacroMarc
MacroMarc

Reputation: 3324

Load 180K rows into array, MUST sort 180K array, then binary search against that sorted array.

Use memo of matched row for each iteration of outer loop, then stop testing for conditions on inner loop once matching is finished. Go easy on the interface updates.

Doevents on each outer iteration is enuff to get by. Just a dump of adequate functions below:

Option Explicit

Sub getHours()
  Dim arr1 As Variant, arr2 As Variant
  arr1 = Sheet1.Range("A2:B9001").Value2
  arr2 = Sheet2.Range("A2:J180001").Value2  'whatever your range is

  QuickSort1 arr2, 10   'sorting data on column 10 as you had it.

  Dim i As Long, j As Long, tempSum As Long

  For i = 1 To UBound(arr1)
        tempSum = 0

        Dim retArr As Variant
        retArr = wsArrayBinaryLookup(arr1(i, 1), arr2, 10, 10, False)
        If Not IsError(retArr(0)) Then
        If arr1(i, 1) = retArr(0) Then
              Dim matchRow As Long
              matchRow = retArr(1)
              'Go through from matched row till stop matching
              Do
                    If arr2(matchRow, 10) <> arr1(i, 1) Then Exit Do
                    If arr2(matchRow, 5) = 55 Then
                          tempSum = tempSum + arr2(matchRow, 8)
                    End If
                    matchRow = matchRow + 1
              Loop While matchRow <= UBound(arr2)
        End If
        End If
        arr1(i, 2) = tempSum
        DoEvents
  Next i

  Sheet1.Range("A2:B9001").Value2 = arr1
End Sub

Public Sub QuickSort1( _
                       ByRef pvarArray As Variant, _
                       ByVal colToSortBy, _
                       Optional ByVal plngLeft As Long, _
                       Optional ByVal plngRight As Long)
  Dim lngFirst As Long
  Dim lngLast As Long
  Dim varMid As Variant
  Dim varSwap As Variant

  If plngRight = 0 Then
        plngLeft = LBound(pvarArray)
        plngRight = UBound(pvarArray)
  End If

  lngFirst = plngLeft
  lngLast = plngRight
  varMid = pvarArray((plngLeft + plngRight) \ 2, colToSortBy)

  Do
        Do While pvarArray(lngFirst, colToSortBy) < varMid And lngFirst < plngRight
              lngFirst = lngFirst + 1
        Loop

        Do While varMid < pvarArray(lngLast, colToSortBy) And lngLast > plngLeft
              lngLast = lngLast - 1
        Loop

        Dim arrColumn As Long
        If lngFirst <= lngLast Then
              For arrColumn = 1 To UBound(pvarArray, 2)
                    varSwap = pvarArray(lngFirst, arrColumn)
                    pvarArray(lngFirst, arrColumn) = pvarArray(lngLast, arrColumn)
                    pvarArray(lngLast, arrColumn) = varSwap
              Next arrColumn
              lngFirst = lngFirst + 1
              lngLast = lngLast - 1
        End If

  Loop Until lngFirst > lngLast

  If plngLeft < lngLast Then QuickSort1 pvarArray, colToSortBy, plngLeft, lngLast
  If lngFirst < plngRight Then QuickSort1 pvarArray, colToSortBy, lngFirst, plngRight
End Sub

Public Function wsArrayBinaryLookup( _
               ByVal val As Variant, _
               arr As Variant, _
               ByVal searchCol As Long, _
               ByVal returnCol As Long, _
               Optional exactMatch As Boolean = True) As Variant

  Dim a As Long, z As Long, curr As Long
  Dim retArr(0 To 1) As Variant

  retArr(0) = CVErr(xlErrNA)
  retArr(1) = 0
  wsArrayBinaryLookup = retArr
  a = LBound(arr)
  z = UBound(arr)


  If compare(arr(a, searchCol), val) = 1 Then
        Exit Function
  End If

  If compare(arr(a, searchCol), val) = 0 Then
        retArr(0) = arr(a, returnCol)
        retArr(1) = a
        wsArrayBinaryLookup = retArr
        Exit Function
  End If

  If compare(arr(z, searchCol), val) = -1 Then
        Exit Function
  End If

  While z - a > 1
        curr = Round((CLng(a) + CLng(z)) / 2, 0)
        If compare(arr(curr, searchCol), val) = 0 Then
              z = curr
              retArr(0) = arr(curr, returnCol)
              retArr(1) = curr
              wsArrayBinaryLookup = retArr
        End If

        If compare(arr(curr, searchCol), val) = -1 Then
              a = curr
        Else
              z = curr
        End If
  Wend

  If compare(arr(z, searchCol), val) = 0 Then
        retArr(0) = arr(z, returnCol)
        retArr(1) = z
        wsArrayBinaryLookup = retArr
  Else
        If Not exactMatch Then
              retArr(0) = arr(a, returnCol)
              retArr(1) = a
              wsArrayBinaryLookup = retArr
        End If
  End If


End Function
Public Function compare(ByVal x As Variant, ByVal y As Variant) As Long

  If IsNumeric(x) And IsNumeric(y) Then
        Select Case x - y
              Case Is = 0
                    compare = 0
              Case Is > 0
                    compare = 1
              Case Is < 0
                    compare = -1
        End Select
  Else
        If TypeName(x) = "String" And TypeName(y) = "String" Then
              compare = StrComp(x, y, vbTextCompare)
        End If
  End If

End Function

Upvotes: 0

Mathieu Guindon
Mathieu Guindon

Reputation: 71187

Sub getHours(uniqueArray() As Variant, Lastrow As Integer)

The procedure is implicily Public, and the parameters are implicitly passed ByRef. As a maintainer I would expect a method named getHours to get me the "hours", whatever that is - but a Sub procedure doesn't return anything to its callers, like a Function does. Hence the name is misleading. Procedures do something, they need a descriptive name that says what it does, ...and then the code needs to do what the name says.

Consistency is also important: you have a camelCase public procedure name, and then mixed-up camelCase and PascalCase parameter names. Stick to PascalCase for module members, and use camelCase for locals/parameters. Or whatever - just be consistent about it.

LastRow being an Integer raises a flag. Integer is a 16-bit signed integer type, making its maximum value 32,767, which will cause problems when you try to assign it to 32,768 or higher. Use a Long instead - a 32-bit signed integer type much more appropriate for general-purpose integer values - especially for something like a "row number", which can be well above 100K in Excel.

Dim i As Integer, lastData As Long

i should be a Long, and lastData is assigned, but never referred to - remove it and its assignment. Speaking of which...

Sheets("Data").Select
lastData = Range("A2").End(xlDown).Row

Don't .Select worksheets. Use a Worksheet object instead:

Dim dataSheet As Worksheet
Set dataSheet = ThisWorkbook.Worksheets("Data")

Note that Range, unqualified with a Worksheet object, implicitly refers to whatever worksheet is active, in whatever workbook is active. Unless you're in a worksheet module's code-behind - in which case it refers to that worksheet. If you mean to do that, be explicit and do Me.Range instead. If not, then properly qualify Range and Cells calls with a Worksheet object.

Then use it:

lastData = dataSheet.Range("A2").End(xlDown).Row

Some more integers:

Dim tempTerms As Integer

Again, no reason to use a 16-bit integer type, declare As Long.

Dim OpenForms

What the heck does this procedure needs to know the number of open forms for? It doesn't. Remove it.

openform = DoEvents

You're assigning to openform, but you declared OpenForms. If your code compiles and runs, it means you haven't specified Option Explicit at the top of the module. Do it. That will prevent VBA from happily compiling typos, and will force you to declare every variable you use. Here OpenForms is left unused, and openform is an undeclared Variant declared on-the-fly by the VBA run-time.

To be honest I didn't even know DoEvents returned anything - it returning the number of open forms strikes me as a giant WTF. Anyway, here's how I've always seen it used:

DoEvents

That's all! Yes, this discards the returned value. But who cares about the number of opened forms in the first place?

tempProj isn't declared. Declare it. j isn't declared. Declare it.


Reading a cell's value is dangerous. Cells contain a Variant, so whenever you read a cell's value into a String or Long or whatever typed variable, you are making VBA perform an implicit type conversion - a conversion that isn't always possible.

This will eventually break - or come back and bite you in this or another project:

If tempProj = Cells(j, 10).Value _
And Cells(j, 5).Value = 55 Then
    tempTerms = tempTerms + Cells(j, 8).Value
End If

You need to be sure the cell doesn't contain an error value before you can do that.

If IsError(Cells(j, 10).Value) Or IsError(Cells(j, 5).Value) Or IsError(Cells(j, 8).Value) Then
    MsgBox "Row " & j & " contains an error value in column 5, 8, or 10."
    Exit Sub
End If

Ok, so what about performance?

  • Avoid Variant when a better type exists.
  • Avoid undeclared variables; they're always Variant. Use Option Explicit.
  • Avoid implicit type conversions.
  • Avoid Select and Activate.
  • Avoid DoEvents.
  • Avoid updating the UI (status bar, etc.).
  • Avoid accessing worksheet cells in a loop.

Read the worksheet's data into a variant array:

Dim dataSheet As Worksheet
Set dataSheet = ThisWorkbook.Worksheets("Data")

Dim sheetData As Variant
sheetData = dataSheet.Range("A1:J" & lastData).Value

Now sheetData is a 2D array that contains every single value in the specified range - all copied in-memory in a split-second.

So the j loop becomes something like this1:

Dim j As Long
For j = 2 To lastData
    If tempProj = sheetData(j, 10) And sheetData(j, 5) = 55 Then
        tempTerms = tempTerms + sheetData(j, 8)
    End If
Next j

Now I see what you're doing. uniqueArray is your return value! Hard to tell by just looking at the method's signature - naming it result or better, outHoursPerTerm, would go a long way into making the code easier to understand at a glance.

Consider setting the Application.Cursor to a hourglass and set it back to default once done - possibly also setting the status bar to "Please wait..." or something similar. If the thing takes longer than 5-8 seconds, then consider updating the status bar for every couple iterations of the outer loop, but note that doing that will make the procedure considerably slower.

Toggling calculation, worksheet events, screen updating and whatnot, isn't going to help here - you're not writing anywhere, only reading. Work off an in-memory 2D array and you should see considerable performance improvements.


This answer purposely reads like a Code Review answer. Questions about improving working code (performance, readability, etc.) are usually a better fit on CR. Consider asking on CR next time you need help improving your working code - as you can see a CR answer covers much more ground than a typical SO answer.


1Not tested, written in the answer box. Might need to transpose rows into columns.

Upvotes: 1

Vityata
Vityata

Reputation: 43585

This is what I usually use for speeding:

Public Sub OnEnd()    
    Application.ScreenUpdating = True
    Application.EnableEvents = True
    Application.AskToUpdateLinks = True
    Application.DisplayAlerts = True
    Application.Calculation = xlAutomatic
    ThisWorkbook.Date1904 = False        
    Application.StatusBar = False        
End Sub

Public Sub OnStart()        
    Application.ScreenUpdating = False
    Application.EnableEvents = False
    Application.AskToUpdateLinks = False
    Application.DisplayAlerts = False
    Application.Calculation = xlAutomatic
    ThisWorkbook.Date1904 = False        
    ActiveWindow.View = xlNormalView    
End Sub

Sub getHours(uniqueArray() As Variant, Lastrow As Integer)
    Dim i As Integer, lastData As Long
    Dim tempTerms As Integer
    Dim OpenForms

    call OnStart
    code ...

    Next i

    call OnEnd

End Sub

The ScreenUpdating = False does about 90% of the job, the rest is there just to make sure it runs as expected.

Edit: Theoretically, if you change Dim tempTerms As Integer to Long it should be faster. And probably it is better to define OpenForms as something.

Upvotes: 0

Related Questions