Reputation: 3914
I was working on a solution to another question of mine when I stumble across this helpful question and answer. However implementing the answer given by Control Freak over there throws me a Type Mismatch
error as soon as I exit the function and return to my code on the line: Years = ReDimPreserve(Years, i, 3)
. I'm not that skilled of a programmer to figure out what is going wrong here, so can anybody shed some light on this.
Here is my code:
Sub DevideData()
Dim i As Integer
Dim Years() As String
ReDim Years(1, 3)
Years(1, 1) = Cells(2, 1).Value
Years(1, 2) = 2
i = 2
ThisWorkbook.Worksheets("Simple Boundary").Activate
TotalRows = ThisWorkbook.Worksheets("Simple Boundary").Range("A100000").End(xlUp).row
For row = 3 To TotalRows
Years = ReDimPreserve(Years, i, 3)
If Not Cells(row, 1).Value = Cells(row - 1, 1).Value Then
Years(i - 1, 3) = row - 1
Years(i, 1) = Cells(row, 1).Value
Years(i, 2) = row
i = i + 1
End If
Next row
End Sub
And here is the function as written by Control Freak:
Public Function ReDimPreserve(aArrayToPreserve, nNewFirstUBound, nNewLastUBound)
ReDimPreserve = False
'check if its in array first
If IsArray(aArrayToPreserve) Then
'create new array
ReDim aPreservedArray(nNewFirstUBound, nNewLastUBound)
'get old lBound/uBound
nOldFirstUBound = UBound(aArrayToPreserve, 1)
nOldLastUBound = UBound(aArrayToPreserve, 2)
'loop through first
For nFirst = LBound(aArrayToPreserve, 1) To nNewFirstUBound
For nLast = LBound(aArrayToPreserve, 2) To nNewLastUBound
'if its in range, then append to new array the same way
If nOldFirstUBound >= nFirst And nOldLastUBound >= nLast Then
aPreservedArray(nFirst, nLast) = aArrayToPreserve(nFirst, nLast)
End If
Next
Next
'return the array redimmed
If IsArray(aPreservedArray) Then ReDimPreserve = aPreservedArray
End If
End Function
Upvotes: 0
Views: 1073
Reputation: 12403
I promised a fuller answer. Sorry it is later than I expected:
As I said in my first comment:
Public Function ReDimPreserve(aArrayToPreserve, nNewFirstUBound, nNewLastUBound)
causes aArrayToPreserve
to have the default type of Variant. This does not match:
Dim Years() As String
As you discovered, redefining Years as a Variant, fixes the problems. An alternative approach would be to amend the declaration of ReDimPreserve
so aArrayToPreserve
is an array of type String. I would not recommend that approach since you are storing both strings and numbers in the array. A Variant array will handle either strings or numbers while a String array can only handle numbers by converting them to strings for storage and back to numbers for processing.
I tried your macro with different quantities of data and different amendments and timed the runs:
Rows of data Amendment Duration of run
3,500 Years() changed to Variant 4.99 seconds
35,000 Years() changed to Variant 502 seconds
35,000 aArrayToPreserve changed to String 656 seconds
As I said in my second comment, ReDim Preserve
is slow for both the inbuilt method and the VBA routine you found. For every call it must:
ReDim Preserve
is a very useful method but it must be used with extreme care. Sometimes I find that sizing an array to the maximum at the beginning and using ReDim Preserve to cut the array down to the used size at the end is a better technique. The best techniques shown below determine the number of entries required before sizing the array.
At the bottom of your routine, I added:
For i = LBound(Years, 1) To LBound(Years, 1) + 9
Debug.Print Years(i, 0) & "|" & Years(i, 1) & "|" & Years(i, 2) & "|" & Years(i, 3)
Next
For i = UBound(Years, 1) - 9 To UBound(Years, 1)
Debug.Print Years(i, 0) & "|" & Years(i, 1) & "|" & Years(i, 2) & "|" & Years(i, 3)
Next
This resulted in the following being output to the Immediate Window:
|||
|AAAA|2|2
|AAAB|3|4
|AAAC|5|7
|AAAD|8|11
|AAAE|12|16
|AAAF|17|22
|AAAG|23|23
|AAAH|24|25
|AAAI|26|28
|AOUJ|34973|34976
|AOUK|34977|34981
|AOUL|34982|34987
|AOUM|34988|34988
|AOUN|34989|34990
|AOUO|34991|34993
|AOUP|34994|34997
|AOUQ|34998|35002
|AOUR|35003|
|||
Since you have called the array Years
, I doubt my string values are anything like yours. This does not matter. What matters, is that I doubt this output was exactly what you wanted.
If you write:
ReDim Years(1, 3)
The lower bounds are set to the value specified by the Option Base
statement or zero if there is no Option Base
statement. You have lower bounds for both dimensions of zero which you do not use. This is the reason for the “|||” at the top. There is another “|||” at the end which means you are creating a final row which you are not using. The final used row does not have an end row which I assume in a mistake.
When I can divide a routine into steps, I always validate the result of one step before advancing to the next. That way, I know any problems are within the current step and not the result of an error in an earlier step. I use Debug.Print
to output to the Immediate Window most of the time. Only if I want to output a lot of diagnostic information will I write to a text file. Either way, blocks of code like mine are a significant aid to rapid debugging of a macro.
I would never write ReDim Years(1, 3)
. I always specify the lower bound so as to be absolutely clear. VBA is the only language I know where you can specify any value for the lower bound (providing it is less than the upper bound) so I will specify non-standard values if is helpful for a particular problem. In this case, I see not advantage to a lower bound other than one so that is what I have used.
With two dimensions arrays it is conventional to have columns as the first dimension and rows as the second. One exception is for arrays read from or to be written to a worksheet for which the dimensions are the other way round. You have rows as the first dimension. If you have used the conventional sequence you could have used the ReDim Preserve
method, thereby avoiding the RedimPreserve
function and the problem of non-matching types.
Technique 1
I expected this to be the fastest technique. Experts advise us to avoid “re-inventing the wheel”. That is, if Excel has a routine that will do what you want, don’t code an alternative in VBA. However, I have found a number of examples where this is not true and I discovered this technique was one of them.
The obvious technique here is to use Filter
, then create a range of the visible rows using SpecialCells
and finally process each row in this range. I have used this technique very successfully to meet other requirements but not here.
I did not know the VBA to select unique rows so started the macro recorder and filtered my test data from the keyboard to get:
Range("A1:A35000").AdvancedFilter Action:=xlFilterInPlace, Unique:=True
My past uses of Filter
have all converted to AutoFilter which I have found to give acceptable performance. This converted to AdvancedFilter
which took 20 seconds both from the keyboard and from VBA. I do not know why it is so slow.
The second problem was that:
Set RngUnique = .Range(.Cells(1, 1), .Cells(RowLast, 1)) _
.SpecialCells(xlCellTypeVisible)
was rejected as “too complicated”.
Not being able to get the visible rows as a range means the benefits of Filter
are not really available. I have counted the visible rows to simulate having RngUnique.Rows.Count
. This shows the technique which has always worked with AutoFilter
. If AdvancedFilter
had reported the unique rows in an accepted time I might have investigated this problem but under the circumstances it does not seem worth the effort.
The macro demonstrating this technique is:
Option Explicit
Sub Technique1()
' * Avoid using meaningless names like i. Giving every variable a meaningful
' name is helpful during development and even more helpful when you return
' to the macro in six months for maintenence.
' * My naming convention is use a sequence of keywords. The first keyword
' identifies what type of data the variable holds. So "Row" means it holds
' a row number. Each subsequent keyword narrows the scope. "RowSb" is a
' row of the worksheet "Simple Boundary" and "RowYears" is a row of the Years
' array. "RowSbCrnt"is the current row of the worksheet "Simple Boundary".
' * I can look at macros I wrote years ago and know what all the variables are.
' You may not like my convention. Fine, development your own but do not
' try programming with random names.
' * Avoid data type Integer which specifies a 16-bit whole number and requires
' special processing on 32 and 64-bit computers. Long is now the recommended
' data type for whole numbers.
Dim NumRowsVisible As Long
Dim RowSbCrnt As Long
Dim RowSbLast As Long
Dim RowYearsCrnt As Long
Dim TimeStart As Double
Dim Years() As Variant
TimeStart = Timer ' Get the time as seconds since midnight to nearest .001
' of a second
' This can save significant amounts of time if the macro amends the
' screen or switches between workbooks.
Application.ScreenUpdating = False
With Worksheets("Simple Boundary")
' Rows.Count avoiding having to guess how many rows will be used
RowSbLast = .Cells(Rows.Count, "A").End(xlUp).Row
' Hide non-unique rows
With .Range(.Cells(1, 1), .Cells(RowSbLast, 1))
.AdvancedFilter Action:=xlFilterInPlace, Unique:=True
End With
' Count number of unique rows.
' It is difficult to time small pieces of code because OS routines
' can execute at any time. However, this count takes less than .5
' of a second with 35,000 rows.
NumRowsVisible = 0
For RowSbCrnt = 2 To RowSbLast
If Not .Rows(RowSbCrnt).Hidden Then
NumRowsVisible = NumRowsVisible + 1
End If
Next
' Use count to ReDim array to final size.
ReDim Years(1 To 3, 1 To NumRowsVisible)
RowYearsCrnt = 1
Years(1, RowYearsCrnt) = .Cells(2, 1).Value
Years(2, RowYearsCrnt) = 2
For RowSbCrnt = 3 To RowSbLast
If Not .Rows(RowSbCrnt).Hidden Then
Years(3, RowYearsCrnt) = RowSbCrnt - 1
RowYearsCrnt = RowYearsCrnt + 1
Years(1, RowYearsCrnt) = .Cells(RowSbCrnt, 1).Value
Years(2, RowYearsCrnt) = RowSbCrnt
End If
Next
' Record final row for final string
Years(3, RowYearsCrnt) = RowSbLast
.ShowAllData ' Clear AdvancedFilter
End With
Application.ScreenUpdating = True
Debug.Print "Duration: " & Format(Timer - TimeStart, "#,##0.000")
' Output diagnostics
For RowYearsCrnt = 1 To 9
Debug.Print Years(1, RowYearsCrnt) & "|" & _
Years(2, RowYearsCrnt) & "|" & _
Years(3, RowYearsCrnt) & "|"
Next
' Note that rows are now in the second dimension hence the 2 in UBound(Years, 2)
For RowYearsCrnt = UBound(Years, 2) - 9 To UBound(Years, 2)
Debug.Print Years(1, RowYearsCrnt) & "|" & _
Years(2, RowYearsCrnt) & "|" & _
Years(3, RowYearsCrnt) & "|"
Next
End Sub
The output to the Immediate Window is:
Duration: 20.570
AAAA|2|2|
AAAB|3|4|
AAAC|5|7|
AAAD|8|11|
AAAE|12|16|
AAAF|17|22|
AAAG|23|23|
AAAH|24|25|
AAAI|26|28|
AOUI|34970|34972|
AOUJ|34973|34976|
AOUK|34977|34981|
AOUL|34982|34987|
AOUM|34988|34988|
AOUN|34989|34990|
AOUO|34991|34993|
AOUP|34994|34997|
AOUQ|34998|35002|
AOUR|35003|35008|
As you can see the last row is correct. A duration of 20 seconds is better than the 8 minutes of your technique but I am sure we can do better.
Technique 2
The next macro is similar to the last one but it counts the unique rows rather than use AdvancedFilter to hide the non-unique rows. This macro has a duration of 1.5 seconds with 35,000 rows. This demonstrates that counting how many rows are required for an array in a first pass of the data is a viable approach. The diagnostic output from this macro is the same as above.
Sub Technique2()
Dim NumRowsUnique As Long
Dim RowSbCrnt As Long
Dim RowSbLast As Long
Dim RowYearsCrnt As Long
Dim TimeStart As Double
Dim Years() As Variant
TimeStart = Timer ' Get the time as seconds since midnight to nearest .001
' of a second
With Worksheets("Simple Boundary")
RowSbLast = .Cells(Rows.Count, "A").End(xlUp).Row
' Count number of unique rows.
' Assume all data rows are unique until find otherwise
NumRowsUnique = RowSbLast - 1
For RowSbCrnt = 3 To RowSbLast
If .Cells(RowSbCrnt, 1).Value = .Cells(RowSbCrnt - 1, 1).Value Then
NumRowsUnique = NumRowsUnique - 1
End If
Next
' * Use count to ReDim array to final size.
' * Note that I have defined the columns as the first dimension and rows
' as the second dimension to match convention. Had I wished, this would
' have allowed me to use the standard ReDim Preserve which can only
' adjust the last dimension. However, this does not match the
' syntax of Cells which has the row first. It may have been better to
' maintain your sequence so the two sequences were the same.
ReDim Years(1 To 3, 1 To NumRowsUnique)
RowYearsCrnt = 1
Years(1, RowYearsCrnt) = .Cells(2, 1).Value
Years(2, RowYearsCrnt) = 2
For RowSbCrnt = 3 To RowSbLast
If .Cells(RowSbCrnt, 1).Value <> .Cells(RowSbCrnt - 1, 1).Value Then
Years(3, RowYearsCrnt) = RowSbCrnt - 1
RowYearsCrnt = RowYearsCrnt + 1
Years(1, RowYearsCrnt) = .Cells(RowSbCrnt, 1).Value
Years(2, RowYearsCrnt) = RowSbCrnt
End If
Next
' Record final row for final string
Years(3, RowYearsCrnt) = RowSbLast
End With
Debug.Print "Duration: " & Format(Timer - TimeStart, "#,##0.000")
' Output diagnostics
For RowYearsCrnt = 1 To 9
Debug.Print Years(1, RowYearsCrnt) & "|" & _
Years(2, RowYearsCrnt) & "|" & _
Years(3, RowYearsCrnt) & "|"
Next
' Note that rows are now in the second dimension hence the 2 in UBound(Years, 2)
For RowYearsCrnt = UBound(Years, 2) - 9 To UBound(Years, 2)
Debug.Print Years(1, RowYearsCrnt) & "|" & _
Years(2, RowYearsCrnt) & "|" & _
Years(3, RowYearsCrnt) & "|"
Next
End Sub
Technique 3
The next macro is only slightly changed from the last.
Firstly, I have replaced the literals used to identify the column numbers in worksheets and arrays with constants such as:
Const ColYrEnd As Long = 3
Under my naming convention ColYrEnd
= Column of Year array holding range End hence:
Years(ColYrEnd, RowYearsCrnt) = RowCvCrnt - 1
instead of Years(3, RowYearsCrnt) = RowCvCrnt - 1
This makes no difference to the compiled code but makes the source code easier to understand because you do not have to remember what columns 1, 2 and 3 hold. More importantly, if you ever have to rearrange the columns, updating the constants is the only change required. If you ever have to search through a long macro replacing every use of 2 as a column number (while ignoring any other use of 2) by 5, you will know why this is important.
Secondly, I have used:
ColValues = .Range(.Cells(1, ColSbYear), _
.Cells(RowSbLast, ColSbYear)).Value
to import column 1 to an array. The code that read the values from the worksheet now reads them from this array. Array access is much faster than worksheet access so this reduces the runtime from 1.5 seconds to .07 seconds.
The revised code is:
Sub Technique3()
Const ColCvYear As Long = 1
Const ColSbYear As Long = 1
Const ColYrYear As Long = 1
Const ColYrStart As Long = 2
Const ColYrEnd As Long = 3
Const RowSbDataFirst As Long = 2
Const RowCvDataFirst As Long = 2
Dim ColValues As Variant
Dim NumRowsUnique As Long
Dim RowCvCrnt As Long
Dim RowSbCrnt As Long
Dim RowSbLast As Long
Dim RowYearsCrnt As Long
Dim TimeStart As Double
Dim Years() As Variant
TimeStart = Timer ' Get the time as seconds since midnight to nearest .001
' of a second
With Worksheets("Simple Boundary")
RowSbLast = .Cells(Rows.Count, ColSbYear).End(xlUp).Row
ColValues = .Range(.Cells(1, ColSbYear), _
.Cells(RowSbLast, ColSbYear)).Value
' * The above statement imports all the data from column 1 as a two dimensional
' array into a Variant. The Variant is then accessed as though it is an array.
' * The first dimension has one entry per row, the second dimension has on entry
' per column which is one in this case. Both dimensions will have a lower bound
' of one even if the first row or column loaded is not one.
End With
' Count number of unique rows.
' Assume all data rows are unique until find otherwise
NumRowsUnique = UBound(ColValues, 1) - 1
For RowCvCrnt = RowCvDataFirst + 1 To UBound(ColValues, 1)
If ColValues(RowCvCrnt, ColCvYear) = ColValues(RowCvCrnt - 1, ColCvYear) Then
NumRowsUnique = NumRowsUnique - 1
End If
Next
' I mentioned earlier that I was unsure if having rows and columns in the
' convention sequence was correct. I am even less sure here where array
' ColValues has been loaded from a worksheet and the rows and columns are
' not in the conventional sequence. ReDim Years(1 To 3, 1 To NumRowsUnique)
RowYearsCrnt = 1
Years(ColYrYear, RowYearsCrnt) = ColValues(RowCvDataFirst, ColCvYear)
Years(ColYrStart, RowYearsCrnt) = RowCvDataFirst
For RowCvCrnt = RowCvDataFirst + 1 To UBound(ColValues, 1)
If ColValues(RowCvCrnt, ColCvYear) <> ColValues(RowCvCrnt - 1, ColCvYear) Then
Years(ColYrEnd, RowYearsCrnt) = RowCvCrnt - 1
RowYearsCrnt = RowYearsCrnt + 1
Years(ColYrYear, RowYearsCrnt) = ColValues(RowCvCrnt, ColCvYear)
Years(ColYrStart, RowYearsCrnt) = RowCvCrnt
End If
Next
' Record final row for final string
Years(ColYrEnd, RowYearsCrnt) = UBound(ColValues, 1)
Debug.Print "Duration: " & Format(Timer - TimeStart, "#,##0.000")
' Output diagnostics
For RowYearsCrnt = 1 To 9
Debug.Print Years(ColYrYear, RowYearsCrnt) & "|" & _
Years(ColYrStart, RowYearsCrnt) & "|" & _
Years(ColYrEnd, RowYearsCrnt) & "|"
Next
' Note that rows are now in the second dimension hence the 2 in UBound(Years, 2)
For RowYearsCrnt = UBound(Years, 2) - 9 To UBound(Years, 2)
Debug.Print Years(ColYrYear, RowYearsCrnt) & "|" & _
Years(ColYrStart, RowYearsCrnt) & "|" & _
Years(ColYrEnd, RowYearsCrnt) & "|"
Next
End Sub
Other techniques
I considered introducing other techniques but I decided they were not useful for this requirement. Also, this answer is already long enough. I have provided much for you to think about and more would just be overload. As stated above I have reduced the run time for 35,000 rows from 8 minutes to 20 seconds to 1.5 seconds to .07 seconds.
Work slowly through my macros. I have hope I have provided adequate explanation of what each is doing. Once you know a statement exists, it is generally easy to look it up so there is not too much explanation of the statements. Come back with questions as necessary.
Upvotes: 1
Reputation: 1126
As you mentioned in the comments, if you are going to continue this way you definitely need to move that redim inside the if statement:
If Not Cells(row, 1).Value = Cells(row - 1, 1).Value Then
Years = ReDimPreserve(Years, i, 3)
Years(i - 1, 3) = row - 1
Years(i, 1) = Cells(row, 1).Value
Years(i, 2) = row
i = i + 1
End If
I think this redimming multi-dimensional arrays is overkill for you. I have a few recommendations:
I notice that you are using 2 values to represent the start of a range and end of a range (years(i,2) is the start and years(i,3) is the end). Instead why not just use an actual range?
Create a range variable called startNode
and when you find the end of the range create a Range
object like with Range(startNode,endNode)
.
Your code will look something like this:
Sub DevideData()
Dim firstCell As Range
Dim nextRange As Range
Set firstCell = Cells(2,1)
ThisWorkbook.Worksheets("Simple Boundary").Activate
TotalRows = ThisWorkbook.Worksheets("Simple Boundary").Range("A100000").End(xlUp).row
For row = 3 To TotalRows
If Not Cells(row, 1).Value = Cells(row - 1, 1).Value Then
Set nextRange = Range(firstCell, Cells(row-1,1))
Set firstCell = Cells(row,1)
End If
Next row
End Sub
Now you do not need to store 3 values! Just an array of ranges Which you can redim like this:
Dim years() As Range
'Do Stuff'
ReDim Preserve years(1 to i)
set years(i) = nextRange
i = i + 1
Note that the only reason that ReDimPreserve
was created was so that you can redim both dimensions of a 2D array (normally you can only change the second dimension). With a 1D array you can freely redim without any troubles! :)
Lastly I recommend that you use a for each
loop instead of a regular for loop. It makes your intentions for the loop more explicit which makes your code more readable.
Dim firstCell as Range
Dim lastUniqueValue as Variant
Dim lastCell as Range
Dim iCell as Range
Set firstCell = Cells(3,1)
lastUniqueValue = firstCell.Value
Set lastCell = ThisWorkbook.Worksheets("Simple Boundary").Range("A100000").End(xlUp)
For Each iCell in Range(firstCell, lastCell)
If iCell.Value <> lastUniqueValue Then
lastUniqueValue = iCell.Value
'Do Stuff
End If
Next
Hope this helps! :)
Upvotes: 0
Reputation: 26640
As stated earlier in comments, ReDim Preserve is an expensive call when working with large datasets and is generally avoided. Here is some commented code that should perform as desired. Tested on a dataset with 200,000 rows, it took less than 5 seconds to complete. Tested on a dataset with 1000 rows, it took less that 0.1 seconds to complete.
The code uses a Collection to get the unique values out of column A, and then builds the array based on those unique values and outputs the results to another sheet. In your original code, there was nowhere that the resulting array was output, so I just made something up and you'll need to adjust the output section as needed.
Sub tgr()
Dim ws As Worksheet
Dim rngYears As Range
Dim collUnqYears As Collection
Dim varYear As Variant
Dim arrAllYears() As Variant
Dim arrYearsData() As Variant
Dim YearsDataIndex As Long
Set ws = ActiveWorkbook.Sheets("Simple Boundary")
Set rngYears = ws.Range("A1", ws.Cells(Rows.Count, "A").End(xlUp))
If rngYears.Cells.Count < 2 Then Exit Sub 'No data
Set collUnqYears = New Collection
With rngYears
.CurrentRegion.Sort rngYears, xlAscending, Header:=xlYes 'Sort data by year in column A
arrAllYears = .Offset(1).Resize(.Rows.Count - 1).Value 'Put list of years in array for faster calculation
'Get count of unique years by entering them into a collection (forces uniqueness)
For Each varYear In arrAllYears
On Error Resume Next
collUnqYears.Add CStr(varYear), CStr(varYear)
On Error GoTo 0
Next varYear
'Ssize the arrYearsData array appropriately
ReDim arrYearsData(1 To collUnqYears.Count, 1 To 3)
'arrYearsData column 1 = Unique Year value
'arrYearsData column 2 = Start row for the year
'arrYearsData column 3 = End row for the year
'Loop through unique values and populate the arrYearsData array with desired information
For Each varYear In collUnqYears
YearsDataIndex = YearsDataIndex + 1
arrYearsData(YearsDataIndex, 1) = varYear 'Unique year
arrYearsData(YearsDataIndex, 2) = .Find(varYear, .Cells(1), , , , xlNext).Row 'Start Row
arrYearsData(YearsDataIndex, 3) = .Find(varYear, .Cells(1), , , , xlPrevious).Row 'End Row
Next varYear
End With
'Here is where you would output your results
'Your original code did not output results anywhere, so adjust sheet and start cell as necessary
With Sheets("Sheet2")
.UsedRange.Offset(1).ClearContents 'Clear previous result data
.Range("A2").Resize(UBound(arrYearsData, 1), UBound(arrYearsData, 2)).Value = arrYearsData
.Select 'This will show the output sheet so you can see the results
End With
End Sub
Upvotes: 1