Reputation: 37
I have a very large data set (600,000 rows) of data structured in the following format:
1) There are around 60 products. One is a Total US number, while the others are for Manufacturers and are labled as KMFs. There are also some labeled as PCKGs(but aren't relevant for this question)
2) Each product is located in 60 different markets
3) Each market has 20 different locations
4) I have 12 metrics for which I need to calculate data in the following manner: Total US number - sum(KMFs) for each metric
I have written vba code for this but it is taking too long to run(around 20 minutes) I need to run similar code on at least 20 worksheets. I have tried various methods such as setting screenUpdating etc. to false. Here is my code. I am new to vba coding so I may have missed obvious things. Please let me know anything is unclear. Please help!
Sub beforeRunningCode()
Application.ScreenUpdating = False
Application.DisplayStatusBar = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False
ActiveSheet.DisplayPageBreaks = False
End Sub
Sub returnToOriginal()
Application.ScreenUpdating = screenUpdateState
Application.DisplayStatusBar = statusBarState
Application.Calculation = calcState
Application.EnableEvents = eventsState
ActiveSheet.DisplayPageBreaks = displayPageBreaksState
End Sub
Function LastRowFunc(Sheet) As Long
LastRowFunc = ActiveWorkbook.Worksheets(Sheet).Range("A2", Worksheets(Sheet).Range("A2").End(xlDown)).Rows.Count
End Function
Function LastColFunc(Sheet) As Long
With ActiveSheet
LastColFunc = ActiveWorkbook.Sheets(Sheet).Cells(1, .Columns.Count).End(xlToLeft).Column
End With
End Function
Sub AOCalculate()
Call beforeRunningCode 'Optimize Excel
Dim LastRow As Long
Dim LastCol As Long
Dim Period As String
Dim Sheet As String
Dim Arr(1 To 16)
Dim Count As Integer
Sheet = "Energy_LS_Bottler"
Period = "2016 WAVE 1 - 3 W/E 05/07"
LastRow = LastRowFunc(Sheet) 'Calculate last row for which data exists
LastCol = LastColFunc(Sheet) 'Calculate last column for which data exists
For Each Location In ActiveWorkbook.Sheets("Locations").Range("D7:D28").Value
For Each Market In ActiveWorkbook.Sheets("Markets").Range("A5:A92").Value
Count = Count + 1
Arr(1) = Market
Arr(2) = "AO"
Arr(3) = Location
Arr(4) = Period
With ActiveWorkbook.Sheets(Sheet) 'Filtering for KMF
.AutoFilterMode = False
.Range(Cells(1, 1), Cells(LastRow, LastCol)).AutoFilter
.Range(Cells(1, 1), Cells(LastRow, LastCol)).AutoFilter field:=17, Criteria1:="=KMF"
.Range(Cells(1, 1), Cells(LastRow, LastCol)).AutoFilter field:=1, Criteria1:=Market
.Range(Cells(1, 1), Cells(LastRow, LastCol)).AutoFilter field:=3, Criteria1:=Location
End With
For k = 5 To 16
Arr(k) = Application.WorksheetFunction.Sum(ActiveWorkbook.Sheets(Sheet).Range(Cells(1, k), Cells(LastRow, k)).SpecialCells(xlCellTypeVisible))
Next k
With ActiveWorkbook.Sheets(Sheet) ' filtering for Total US
.AutoFilterMode = False
.Range(Cells(1, 1), Cells(LastRow, LastCol)).AutoFilter
.Range(Cells(1, 1), Cells(LastRow, LastCol)).AutoFilter field:=17, Criteria1:="=Total US"
.Range(Cells(1, 1), Cells(LastRow, LastCol)).AutoFilter field:=1, Criteria1:=Market
.Range(Cells(1, 1), Cells(LastRow, LastCol)).AutoFilter field:=3, Criteria1:=Location
End With
For k = 5 To 16
Arr(k) = -Arr(k) + Application.WorksheetFunction.Sum(ActiveWorkbook.Sheets(Sheet).Range(Cells(1, k), Cells(LastRow, k)).SpecialCells(xlCellTypeVisible))
Next k
For j = 1 To 16
ActiveWorkbook.Sheets(Sheet).Cells(LastRow + Count, j).Value = Arr(j)
Next j
Erase Arr
Next
Next
ActiveWorkbook.Sheets(Sheet).AutoFilterMode = False
Call returnToOriginal
End Sub
[Edit]: Here is a link to a sample data set https://drive.google.com/file/d/0B3MkGa57h6g_WGl2WWlWekd4NU0/view?usp=sharing
Upvotes: 1
Views: 107
Reputation: 56725
I think that this will work (though I haven't had a chance to test it), and should be a lot faster:
Sub AOCalculate()
Call beforeRunningCode 'Optimize Excel
Dim LastRow As Long
Dim LastCol As Long
Dim Period As String
Dim Sheet As String
Dim Arr() '1 To 2000, 1 To 16)
Dim Count As Integer
Sheet = "Energy_LS_Bottler"
Period = "2016 WAVE 1 - 3 W/E 05/07"
LastRow = LastRowFunc(Sheet) 'Calculate last row for which data exists
LastCol = LastColFunc(Sheet) 'Calculate last column for which data exists
'copy all of the relevant cells to local arrays for speed
Dim Locations(), Markets(), data()
Markets = ActiveWorkbook.Sheets("Markets").Range("A5:A92").Value
Locations = ActiveWorkbook.Sheets("Locations").Range("D7:D28").Value
'(pretty sure the following line needs to localize the Cells() to .Cells())
data = ActiveWorkbook.Sheets(Sheet).Range(Cells(1, 1), Cells(LastRow, LastCol)).Value '**'
ReDim Arr(1 To UBound(Markets, 1) * UBound(Locations, 1), 16)
'make an index of pointers into our accumulation array
Dim counts As New Collection
Dim i As Long, l As Long, m As Long
For l = 1 To UBound(Locations, 1)
Location = Locations(l, 1) '**'
For m = 1 To UBound(Markets, 1)
Market = Markets(m, 1) '**'
i = i + 1
counts.Add i, CStr(Location) & "~" & CStr(Market)
'counts.Add NewAccumArray(Location, Market, Period), CStr(Location) & "~" & CStr(Market)
Arr(i, 1) = Market
Arr(i, 2) = "AO"
Arr(i, 3) = Location
Arr(i, 4) = Period
Next
Next
' go through each row and add it to the appropiate count in the array
Dim r As Long
Dim key As String, idx As Long
For r = 1 To UBound(data, 1)
key = CStr(data(r, 3)) & "~" & CStr(data(r, 1))
If data(r, 17) = "KMF" Then
idx = counts(key)
For k = 5 To 16
Arr(idx, k) = Arr(idx, k) - data(r, k)
Next k
Else
If data(r, 17) = "Total US" Then
idx = counts(key)
For k = 5 To 16
Arr(idx, k) = Arr(idx, k) + data(r, k)
Next k
End If
End If
Next r
' output the results
ActiveWorkbook.Sheets(Sheet).Range(Cells(LastRow + 1, 1), Cells(LastRow + Count, 16)).Value = Arr
ActiveWorkbook.Sheets(Sheet).AutoFilterMode = False
Call returnToOriginal
End Sub
Answering the query "What did I mean by this?"
'(pretty sure the following line needs to localize the Cells() to .Cells())
data = ActiveWorkbook.Sheets(Sheet).Range(Cells(1, 1), Cells(LastRow, LastCol)).Value '**'
The use of Cells(..)
here is fundamentally unreliable and broken. this is because Cells(..)
is really a shortcut for ActiveSheet.Cells(..)
and the Active* properties are inherently slow and unreliable because they can change while the code is running. Worse, this code is assuming that ActiveSheet = Energy_LS_Blotter
which is far from certain.
The correct way to write this line would be like this:
data = ActiveWorkbook.Sheets(Sheet).Range( _
ActiveWorkbook.Sheets(Sheet).Cells(1, 1), _
ActiveWorkbook.Sheets(Sheet).Cells(LastRow, LastCol) _
).Value
But that is long, ugly and inconvenient. An easier way would be to use either a Sheet variable, or a With
:
With ActiveWorkbook.Sheets(Sheet)
data = .Range(.Cells(1, 1), .Cells(LastRow, LastCol)).Value
End With
Upvotes: 2