gotmike
gotmike

Reputation: 1615

Cleaning up an array

I have a very large array in VBA which includes a lot of 0 values that I'd like to remove. Something like this:

A    B    C    12345
D    E    F    848349
G    H    I    0
J    K    L    0
M    N    O    0
P    Q    R    4352
S    T    U    0
V    W    X    0

I would like to be able to quickly/easily strip out all rows from this array that have a zero in the 4th column, resulting in something like this:

A    B    C    12345
D    E    F    848349
P    Q    R    4352

This array has 100,000 or so rows, that hopefully gets down to a number closer to 20,000 or 30,000 rows instead after processing.

I assume iterating through every entry will prove very time-consuming.

Is there another way that is faster?

Upvotes: 0

Views: 391

Answers (2)

NeepNeepNeep
NeepNeepNeep

Reputation: 913

Okay, it's all off-sheet, so all the arrays are zero-based. To test this set-up, I created a worksheet with four columns, as per your data and using random numbers in the fourth column. I saved this to a text file (TestFile.txt), then read it in to be able to get a zero-based array (Excel ranges are 1-based when you take them into an array). I saved 150000 rows to the text file to properly stress the routine. Yes, I have an SSD and that would affect the 2s run time, but I'd still expect it to run in <10s on a spinning HDD, I think.

Anyway, here's the code (requires a VBA reference to Microsoft Scripting Runtime purely to read in the file):

Public Function ReturnFilteredArray(arrSource As Variant, _
                                strValueToFilterOut As String) As Variant
Dim arrDestination      As Variant
Dim lngSrcCounter       As Long
Dim lngDestCounter      As Long

ReDim arrDestination(UBound(arrSource, 1) + 1, UBound(arrSource, 2) + 1)

lngDestCounter = 1
For lngSrcCounter = LBound(arrSource, 1) To UBound(arrSource, 1)
    ' Assuming the array dimensions are (100000, 3)
    If CStr(arrSource(lngSrcCounter, 3)) <> strValueToFilterOut Then
        ' Hit an element we want to include
        arrDestination(lngDestCounter, 1) = arrSource(lngSrcCounter, 0)
        arrDestination(lngDestCounter, 2) = arrSource(lngSrcCounter, 1)
        arrDestination(lngDestCounter, 3) = arrSource(lngSrcCounter, 2)
        arrDestination(lngDestCounter, 4) = arrSource(lngSrcCounter, 3)

        lngDestCounter = lngDestCounter + 1
    End If
Next

ReturnFilteredArray = arrDestination
End Function

Sub TestRun()
Dim fso As FileSystemObject
Dim txs As TextStream
Dim arr As Variant
Dim arr2 As Variant
Dim lngCounter As Long

Debug.Print Now()
Set fso = New FileSystemObject
Set txs = fso.OpenTextFile("E:\Users\Thingy\Desktop\TestFile.txt", ForReading)
arr = Split(txs.ReadAll, vbNewLine)
ReDim arr2(UBound(arr), 3)

For lngCounter = 0 To UBound(arr) - 1
    arr2(lngCounter, 0) = Split(arr(lngCounter), vbTab)(0)
    arr2(lngCounter, 1) = Split(arr(lngCounter), vbTab)(1)
    arr2(lngCounter, 2) = Split(arr(lngCounter), vbTab)(2)
    arr2(lngCounter, 3) = Split(arr(lngCounter), vbTab)(3)
Next

arr2 = ReturnFilteredArray(arr2, "0")
Range("L2").Resize(UBound(arr2, 1), 5) = arr2

Debug.Print Now()
End Sub

There are a number of assumptions in there, not least the dimensions. Note the difference in the second dimension counter between arrDestination and arrSource. That's to do with Excel being 1-based and normal arrays being 0-based.

Also, when I'm writing out the array, I needed to bump up the second dimension to 5 in order to get all of the array out to the sheet. I wasn't able to trim off the empty elements since ReDim Preserve only works on the uppermost dimension (columns) and it's the first dimension (rows) that's changing here.

Anywho, this should serve as a reminder that despite its faults Excel is pretty amazing.

Upvotes: 1

Ambie
Ambie

Reputation: 4977

I'm not aware of any other way in VBA than to loop through the array and write another array/list.

What makes it trickier is that your array looks to be two-dimensional and VBA will only allow you to redim the last dimension. From the look of your data, you'd want to redim the first dimension as you iterate through your array.

There are several solutions:

  1. Iterate your data twice - once to get the array size (and probably to store the relevant row numbers) and a second time to transfer the raw data into your new data.

  2. Iterate once and just reverse your dimensions (ie row is last).

  3. Use an array of arrays, so that each array only has one dimension).

  4. Use a Collection which doesn't need to be dimensioned - this would be my preferred option.

Option 4 would look like this (I've assumed your array is zero based):

Dim resultList As Collection
Dim r As Long

Set resultList = New Collection
For r = 0 To UBound(raw, 1)
    If raw(r, 3) <> 0 Then
        resultList.Add Array(raw(r, 0), raw(r, 1), raw(r, 2), raw(r, 3))
    End If
Next

If you have to write to a new array, then here's an example of Option 1:

Dim rowList As Collection
Dim result() As Variant
Dim r As Long
Dim c As Long
Dim v As Variant

Set rowList = New Collection
For r = 0 To UBound(raw, 1)
    If raw(r, 3) <> 0 Then
        rowList.Add r
    End If
Next

ReDim result(rowList.Count - 1, 3) As Variant
c = 0
For Each v In rowList
    result(c, 0) = raw(v, 0)
    result(c, 1) = raw(v, 1)
    result(c, 2) = raw(v, 2)
    result(c, 3) = raw(v, 3)
    c = c + 1
Next

Upvotes: 3

Related Questions