drcoding
drcoding

Reputation: 173

VBA and Excel optimization of script, dealing with 700,000 rows

Hello StackOverflowers,

I am currently working on a script that has one nested IF statement in it. When run it could potentially compute around 1.4m IF's.

I have run a test with a timer (not too sure on the accuracy of the timer in VBA) and crunching 1000 rows gives me a time of 10 seconds. 10 * 700 = 7000 seconds, which = 1.94 hours.

Can anyone give me any tips for optimisation when dealing with such large data sets?

My code is as follows

Sub itS1Capped()
    Dim Start, Finish, TotalTime
    Start = Timer
    Dim c, d, j, lastRow
    c = 1

    'find how many rows
    With Worksheets("Data")
    lastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
    End With

    'loop through all rows
    For Each d In Worksheets("Data").Range("D2:D" & lastRow).Cells 'd = IT S0 Uncapped

        j = Worksheets("Data").Range("J" & c + 1).Value  'IT Cap
        If j <> 0 Then

            If d > j Then
                Worksheets("Data").Range("K" & c + 1).Value = j 'IT S1 Capped = j
            Else
                 Worksheets("Data").Range("K" & c + 1).Value = d 'IT S1 Capped = d
            End If
        Else
            Worksheets("Data").Range("K" & c + 1).Value = d 'IT S1 Capped = d
        End If
        c = c + 1
    Next
    Finish = Timer
    TotalTime = Finish - Start
    MsgBox TotalTime
End Sub

Upvotes: 1

Views: 2392

Answers (5)

Vincent Courtemanche
Vincent Courtemanche

Reputation: 308

So I took inspiration from Mark Moore's use of arrays and found that using an array function rather than copying and pasting a plain function across a range is much faster. On my machine, Mark's procedure runs in 2.2 seconds, and the one below in 1.4 seconds.

Sub FormulaArray()
    Dim iUsedRows As Long, rCell As Range, StartTimer As Double, Duration As Double

    StartTimer = Timer
    iUsedRows = ActiveSheet.UsedRange.Cells(ActiveSheet.UsedRange.Rows.Count, 1).Row

    With Range(Cells(1, 11), Cells(iUsedRows, 11))
        .FormulaArray = "=IF(J:J<>0,IF(D:D>J:J,J:J,D:D),D:D)"
        .Copy
        .PasteSpecial xlPasteValues
    End With

    Duration = StartTimer - Timer
    MsgBox Format(Duration, "#0.0000") & " seconds to run"

End Sub

Upvotes: 3

peege
peege

Reputation: 2477

I'm not sure if it will make a difference, but since you are timing it, I'd be interested to know.

I modified your code slightly. The main change is For each D in worksheets. Otherwise, I used Cells(row,col) instead of Range. Not that I expect that change to save time, I just thought you might like to see another way of defining cells, instead of concatenating letters and numbers.
note: with cells, you can use all variables, and numbers, with no letters. I just used letters to show you the similarities.

also, Since you have a c + 1 in every row, why not also just start on row 2, leave out the multiple (+1s) and go from there?

UN-TESTED

Sub itS1Capped()
    Dim Start, Finish, TotalTime    'What are you declaring these variables as?
    Dim c, d, j, lastRow

    Start = Timer

    'find how many rows
    lastRow = Sheets("Data").Cells(Rows.Count, "A").End(xlUp).row

    'loop through all rows
    For c = 2 To lastRow                             'c = IT S0 Uncapped  (OLD d)

        j = Sheets("Data").Cells(c, "J").Value          'IT Cap   = Cells(c, 10)
        If j <> 0 Then

            If c > j Then
                Sheets("Data").Cells(c, "K").Value = j  'IT S1 Capped = j    
            Else
                Sheets("Data").Cells(c, "K").Value = c  'IT S1 Capped = c
            End If
        Else
            Sheets("Data").Cells(c, "K").Value = c      'IT S1 Capped = c
        End If
    Next c
    Finish = Timer
    TotalTime = Finish - Start
    MsgBox TotalTime
End Sub

edit: replaced d with c

Upvotes: 0

Vincent Courtemanche
Vincent Courtemanche

Reputation: 308

I cannot test this right now, but i believe if you write a function to replace your nested IF statements, add it to Range("K2") with

Range("K2").Formula = ...

then copy it down to Cells(lastrow, "K"), copy all the functions and paste as values it'll be much faster.

Of course using

Application.Calculation = xlCalculationManual

and

Application.Calculation = xlCalculationAutomatic

like enemy suggested, along with

Application.screenupdate = false

Might make it slightly faster, but I think the function-copy-paste will make the biggest difference.

I don't have time to post updated code at the moment, but hopefully I'll get to it tomorrow.

Hope that helps!

EDIT: Here's the revised code

WARNING: I haven't been able to test this code yet. I'll do so tomorrow and revise if needed.

Sub FunctionCopyPaste()
    Dim iLastRow as Integer

    With Worksheets("Data")
        iLastRow = .UsedRange.Cells(.UsedRange.Rows.Count,1).Row
        .Range("K2").Formula = "=IF(J2<>0,IF(D2>J2,J2,D2),D2)"
        .Range("K2").Copy Range(Cells(2,4), Cells(iLastRow,1).Row,4))
    End With

    With Range(Cells(2,4), Cells(iLastRow,4))
        .Copy
        .PasteSpecial xlPasteValues
    End With

End Sub

Upvotes: 0

Mark Moore
Mark Moore

Reputation: 510

I am a bit old school and so "arrays" are your friend :-) Have had similar problems when I first took over looking after some pretty complex spreadsheets at work that did large numbers of validations. When working with large volumes of data, moving between the workbook and the data on the worksheet is not recommended, because each action is effectively an I/O (Input/ output) operation and these are very time consuming. It is massively more efficient to read all your data into an array, work with the array data and then write it back to the sheet at the end, this is effectively 2 I/O's instead of the 700,000 if you read the sheet data each time. As a rough example, I reduced our previous validation time down from 25 minutes to 4 seconds using this approach.

Sub ValidateSheet()
Dim DataRange As String
Dim SheetArray As Variant
Dim StartCol As String
Dim EndCol As String
Dim StartRow As Long ' long to cope with over 32k records
Dim lastrow As Long
Dim WorksheetToRead As String
Dim ArrayLoopCounter As Long
Dim Start, Finish, TotalTime

    Start = Timer

    'I use variables for the data range simply to allow it to be changed easily.  My real code is actually paramatised so a single reusable procedure
    'is used to populate all my arrays
    'find how many rows
    WorksheetToRead = "Data"
    StartCol = "A"
    EndCol = "Z"
    StartRow = 1
    lastrow = Sheets(WorksheetToRead).Cells(Rows.Count, "A").End(xlUp).Row

    'set the range to be read into the array
    DataRange = StartCol & Trim(Str(StartRow)) & ":" & EndCol & Trim(Str(StartRow - 1 + lastrow))
    SheetArray = Worksheets(WorksheetToRead).Range(DataRange).Value ' read all the values at once from the Excel grid, put into an array
    'Loop around the data
    For ArrayLoopCounter = LBound(SheetArray, 1) To UBound(SheetArray, 1)
        If SheetArray(ArrayLoopCounter, 10) <> 0 Then '10 is column J
            'Compare D with J
            If SheetArray(ArrayLoopCounter, 4) > SheetArray(ArrayLoopCounter, 10) Then '10 is column J
                SheetArray(ArrayLoopCounter, 11) = SheetArray(ArrayLoopCounter, 10) 'set col K = Col J
            Else
                SheetArray(ArrayLoopCounter, 11) = SheetArray(ArrayLoopCounter, 4) 'set col K = Col D
            End If
        Else
            SheetArray(ArrayLoopCounter, 11) = SheetArray(ArrayLoopCounter, 4) 'set col K = Col D
        End If
    Next ArrayLoopCounter
    'Write the updated array back to the sheet
    Worksheets(WorksheetToRead).Range(DataRange) = SheetArray

    Finish = Timer
    TotalTime = Finish - Start
    MsgBox TotalTime
End Sub

Upvotes: 2

nemmy
nemmy

Reputation: 751

Have you tried turning off automatic recalculation before you run your script?

Application.Calculation = xlCalculationManual

And then turn it back on when you're done?

Application.Calculation = xlCalculationAutomatic

That usually speeds up processing of lots of rows assuming you're not changing something that needs recalculating before you work on the next (or subsequent) rows.

Upvotes: -1

Related Questions