Reputation: 1607
Building off of one of my past questions
What I'm looking to accomplish:
I'm looking to find and highlight duplicate Upcharges using VBA code based on multiple criteria:
If there is more than one instance/row in a spreadsheet that share/match ALL of these criteria then that means the Upcharge is a duplicate. As seen in my previous post linked above:
What I've tried:
scripting.dictionary
, but I'm not sure how (or if) that would work with a multi-dimensional array.Now I've finally found the method I think will be much faster,
The faster method I'm looking to use: Dumping the aforementioned columns into a multi-dimensional array, finding the duplicate "rows" in the array, then highlighting the corresponding spreadsheet rows.
My attempt at the faster method: Here's how I populate the multi-dimensional array
Sub populateArray()
Dim arrXID() As Variant, arrUpchargeOne() As Variant, arrUpchargeTwo() As Variant, arrUpchargeType() As Variant, arrUpchargeLevel() As Variant
Dim arrAllData() As Variant
Dim i As Long, lrow As Long
lrow = ActiveSheet.Cells(Rows.Count, 1).End(xlUp).Row
arrXID = Range("A2:A" & lrow) 'amend column number
arrUpchargeOne = Range("CT2:CT" & lrow)
arrUpchargeTwo = Range("CU2:CU" & lrow)
arrUpchargeType = Range("CV2:CV" & lrow)
arrUpchargeLevel = Range("CW2:CW" & lrow)
ReDim arrAllData(1 To UBound(arrXID, 1), 4) As Variant
For i = 1 To UBound(arrXID, 1)
arrAllData(i, 0) = arrXID(i, 1)
arrAllData(i, 1) = arrUpchargeOne(i, 1)
arrAllData(i, 2) = arrUpchargeTwo(i, 1)
arrAllData(i, 3) = arrUpchargeType(i, 1)
arrAllData(i, 4) = arrUpchargeLevel(i, 1)
Next i
End Sub
I can get the columns into the array, but I get stuck from there. I'm not sure how to go about checking for the duplicate "rows" in the array.
My questions:
Formula from my previous post for reference:
=AND(SUMPRODUCT(($A$2:$A$" & lastRow & "=$A2)*($CT$2:$CT$" & lastRow & "=$CT2)*($CU$2:$CU$" & lastRow & "=$CU2)*($CV$2:$CV$" & lastRow & "=$CV2)*($CW$2:$CW$" & lastRow & "=$CW2))>1,$CT2 <> """")"
Returns TRUE if Upcharge is a duplicate
Upvotes: 3
Views: 1487
Reputation:
First off, your choice of functions was inappropriate for such a large number of rows coupled with several conditions. A COUNTIFS function can perform many of the same multiple criteria operations that a SUMPRODUCT function can but in typically 25-35% of the calculation load and time. Additionally, full column references can be used without detriment in COUNTIFS as the column references are internally truncated at the limits of the Worksheet.UsedRange property.
Your standard formula can be written with COUNTIFS as,
=AND(COUNTIFS(A:A, A2, CT:CT, CT2, CU:CU, CU2, CV:CV, CV2, CW:CW, CW2)>1, CT2<>"")
'... or,
=COUNTIFS(A:A, A2, CT:CT, CT2, CT:CT, "<>", CU:CU, CU2, CV:CV, CV2, CW:CW, CW2)>1
Bringing the non-blank column CT condition directly into the COUNTIFS function actually improved calculation time slightly.
Only Calculate When You Have To
The original formula can be broken down to two main conditions.
A rudimentary IF function halts processing if the condition is not true. If the test for a non-blank cell in column CT is moved into a wrapping IF then the COUNTIFS (the bulk of the calculation) will only be processed if there is a value in the current row's CT column.
The improved standard formula becomes,
=IF(CT2<>"", COUNTIFS(A:A, A2, CT:CT, CT2, CU:CU, CU2, CV:CV, CV2, CW:CW, CW2)>1)
The benefits for this modification depend upon the number of blank cells in column CT. If only 1% of the 15,000 cells are blank, very little improvement will be noticed. However, if 50% of the cells in column CT are typically blank there will be a substantial improvement as you are literally knocking your calculation cycles in half.
Sorting the Data to Limit the Ranges
By far, the biggest calculation parasite is with the COUNTIFS looking through 15,000 rows of data in five separate columns. If the data was sorted on one or more of the criteria columns then it becomes unnecessary to look through all 15,000 rows for matches to all five columns of criteria.
For the purpose of this exercise, it will be assumed that column A is sorted in an ascending manner. If you want to test the hypothesis discussed here, sort the data now.
The INDEX function does more than return a value; it actually returns a valid cell address. When used in its most common lookup capacity, you see the value returned but in reality, unlike a similar VLOOKUP operation which only return the cell's value, INDEX is returning the actual cell; e.g. =A1
, not the 99
that A1 contains. This hyper-functionality can be used to create valid ranges that can be used in other functions. e.g. A2:A9
can also be written as INDEX(A:A, 2):INDEX(A:A, 9)
.
This functionality cannot be used directly within a Conditional Formatting rule. However, it can be used in a Named Range and a Named Range can be used in a Conditional Formatting rule.
Sub lminyCFrule()
Debug.Print Timer
'Application.ScreenUpdating = False '<~~ uncomment this once you are no longer debugging
On Error Resume Next '<~~ needed for deleting objects without checking to see if they exist
With Worksheets("Upcharge") '<~~ you know what worksheet you are supposed to be on
If .AutoFilterMode Then .AutoFilterMode = False
'delete any existing defined name called 'localXID' or 'local200'
With .Parent
.Names("localXID").Delete
.Names("local200").Delete
End With
'create a new defined name called 'localXID' for CF rule method 1
.Names.Add Name:="localXID", RefersToR1C1:= _
"=INDEX('" & .Name & "'!C1:C104, MATCH('" & .Name & "'!RC1, '" & .Name & "'!C1, 0), 0):" & _
"INDEX('" & .Name & "'!C1:C104, MATCH('" & .Name & "'!RC1, '" & .Name & "'!C1 ), 0)"
'create a new defined name called 'local200' for CF rule method 2
.Names.Add Name:="local200", RefersToR1C1:= _
"=INDEX(Upcharge!C1:C104, MAX(2, ROW()-100), 0):INDEX(Upcharge!C1:C101, ROW()+100, 0)"
With .Cells(1, 1).CurrentRegion
'sort on column A in ascending order
.Cells.Sort Key1:=.Columns(1), Order1:=xlAscending, _
Orientation:=xlTopToBottom, Header:=xlYes
'create a CF rule on column CS
With .Resize(.Rows.Count - 1, 1).Offset(1, 96)
With .FormatConditions
.Delete
' method 1 and method 2. Only use ONE of these!
' method 1 - definitively start and end of XIDs in column A (slower, no mistakes)
'.Add Type:=xlExpression, Formula1:= _
"=IF(CT2<>"""", COUNTIFS(INDEX(localXID, 0, 1), A2, INDEX(localXID, 0, 98), CT2," & _
"INDEX(localXID, 0, 99), CU2, INDEX(localXID, 0, 100), CV2," & _
"INDEX(localXID, 0, 101), CW2)-1)"
' method 2 - best guess at start and end of XIDs in column A (faster, guesswork at true scope)
.Add Type:=xlExpression, Formula1:= _
"=IF(CT2<>"""", COUNTIFS(INDEX(local200, 0, 1), A2, INDEX(local200, 0, 98), CT2," & _
"INDEX(local200, 0, 99), CU2, INDEX(local200, 0, 100), CV2," & _
"INDEX(local200, 0, 101), CW2)-1)"
End With
.FormatConditions(.FormatConditions.Count).Interior.ColorIndex = 3
End With
'Filter based on column CS is red
.Columns(97).AutoFilter Field:=1, Criteria1:=vbRed, Operator:=xlFilterCellColor
End With
End With
Application.ScreenUpdating = True
Debug.Print Timer
End Sub
While not screaming fast, this does the job handily. The 'best guess' is faster than the 'definitive start and finish' but you run the risk of not completely covering the scope of the duplicates in column A. Of course, the offsets (e.g. 100 up and down) that control the scope could be adjusted.
Upvotes: 1
Reputation: 121
This might work like a magic trick, but not sure if it would work.
Could you just create another supportive (temporary) column, concatenating all four criteria?
ZZ_Temp = concatenate (CS; CV; CZ; etc)
This way, I suppose, you could show/highlight duplicates a lot faster.
Upvotes: 0
Reputation:
You say identify duplicates; I hear Scripting.Dictionary object.
Public Sub lminyDupes()
Dim d As Long, str As String, vAs As Variant, vCTCWs As Variant
Dim dDUPEs As Object '<~~ Late Binding
'Dim dDUPEs As New Scripting.Dictionary '<~~ Early Binding
Debug.Print Timer
Application.ScreenUpdating = False '<~~ uncomment this once you are no longer debugging
'Remove the next line with Early Binding¹
Set dDUPEs = CreateObject("Scripting.Dictionary")
dDUPEs.comparemode = vbTextCompare
With Worksheets("Upcharge") '<~~ you know what worksheet you are supposed to be on
With .Cells(1, 1).CurrentRegion
With .Resize(.Rows.Count - 1, .Columns.Count).Offset(1, 0)
.Columns(97).Interior.Pattern = xlNone '<~~ reset column CS
'the following is intended to mimic a CF rule using this formula
'=AND(COUNTIFS(A:A, A2, CT:CT, CT2, CU:CU, CU2, CV:CV, CV2, CW:CW, CW2)>1, SIGN(LEN(CT2)))
vAs = .Columns(1).Value2
vCTCWs = Union(.Columns(98), .Columns(99), .Columns(100), .Columns(101)).Value2
For d = LBound(vAs, 1) To UBound(vAs, 1)
If CBool(Len(vCTCWs(d, 1))) Then
'make a key of the criteria values
str = Join(Array(vAs(d, 1), vCTCWs(d, 1), vCTCWs(d, 2), vCTCWs(d, 3), vCTCWs(d, 4)), ChrW(8203))
If dDUPEs.exists(str) Then
'the comboned key exists in the dictionary; append the current row
dDUPEs.Item(str) = dDUPEs.Item(str) & Chr(44) & "CS" & d
Else
'the combined key does not exist in the dictionary; store the current row
dDUPEs.Add Key:=str, Item:="CS" & d
End If
End If
Next d
'reuse a variant var to provide row highlighting
Erase vAs
For Each vAs In dDUPEs.keys
'if there is more than a single cell address, highlight all
If CBool(InStr(1, dDUPEs.Item(vAs), Chr(44))) Then _
.Range(dDUPEs.Item(vAs)).Interior.Color = vbRed
Next vAs
End With
End With
End With
dDUPEs.RemoveAll: Set dDUPEs = Nothing
Erase vCTCWs
Application.ScreenUpdating = True
Debug.Print Timer
End Sub
This seems faster than the formula approach.
¹ If you plan to convert the late binding of the Scripting.Dictionary object to early binding, you must add Microsoft Scripting Runtime to the VBE's Tools ► References.
Upvotes: 4
Reputation: 107767
Consider an SQL solution as this is a typical aggregate group by query where you filter for counts greater than 1. To go about your route requires many conditional logic within the loop across all elements of array.
While I recommend you simply import your data into a database like Excel's sibling MS Access, Excel can run SQL statements on its own workbook using an ADO connection (not to get into particulars but both Excel and Access uses the same Jet/ACE engine). And one good thing is you seem to be set up to run such a query with the table like structure of named columns.
The below example references your fields in a worksheet called Data (Data$
) and query outputs to a worksheet called Results (with headers). Change names as needed. Two connection strings are included (one of which is commented out). Hopefully it runs on your end!
Sub RunSQL()
Dim conn As Object, rst As Object
Dim i As Integer, fld As Object
Dim strConnection As String, strSQL As String
Set conn = CreateObject("ADODB.Connection")
Set rst = CreateObject("ADODB.Recordset")
' Connection and SQL Strings
' strConnection = "DRIVER={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};" _
' & "DBQ=C:\Path\To\Workbook.xlsm;"
strConnection = "Provider=Microsoft.ACE.OLEDB.12.0;" _
& "Data Source='C:\Path\To\Workbook.xlsm';" _
& "Extended Properties=""Excel 8.0;HDR=YES;"";"
strSQL = " SELECT [Data$].[Product's XID], [Data$].[Upcharge Criteria 1]," _
& " [Data$].[Upcharge Criteria 2], [Data$].[Upcharge Type]," _
& " [Data$].[Upcharge Type], [Data$].[Upcharge Level]" _
& " FROM [Data$]" _
& " GROUP BY [Data$].[Product's XID], [Data$].[Upcharge Criteria 1]," _
& " [Data$].[Upcharge Criteria 2], [Data$].[Upcharge Type]," _
& " [Data$].[Upcharge Type], [Data$].[Upcharge Level]," _
& " [Data$].[Product's XID]" _
& " HAVING COUNT(*) > 1;"
' Open the db connection
conn.Open strConnection
rst.Open strSQL, conn
' Column headers
i = 0
Worksheets("Results").Range("A1").Activate
For Each fld In rst.Fields
ActiveCell.Offset(0, i) = fld.Name
i = i + 1
Next fld
' Data rows
Worksheets("Results").Range("A2").CopyFromRecordset rst
rst.Close
conn.Close
End Sub
Upvotes: 0
Reputation: 1975
Why don't you remove the Indirect()
and replace the Countif()
function with some stable Row
reference. Since Indirect()
part is a volatile and instead of using Indirect()
you can straight away use some stable row reference like $A$2:$A$50000
which may show some significant change in performance.
Or
Use Create Table for your data. Use Table reference in your formula which will work faster than Indirect()
reference.
Edit
Your actual formula
=AND(SUMPRODUCT(($A$2:$A$500=$A2)*($CU$2:$CU$500=$CU2)*($CV$2:$CV$500=$CV2)*($CW$2:$CW$500=$CW2)*($CX$2:$CX$500=$CX2))>1,$CU2 <> "")
Why don't you convert it to Counti(S)
with stable reference like the below?
=AND(COUNTIFS($A$2:$A$500,$A2,$CU$2:$CU$500,$CU2,$CV$2:$CV$500,$CV2,$CW$2:$CW**$500,$CW2,$CX$2:$CX$500,$CX2)>1,$CU12<>"")
Upvotes: 0