Reputation: 3
My excel spreadsheet contains 11 columns and 500k rows. Each row is a sample from an 8-channel digital logic analyzer: column A is a time stamp; columns B through I are bit values (either a 1 or 0 in each cell); column J is a byte in binary created with CONCATENATE(B,C,D,E,F,G,H,I); and column K is that same byte in hexadecimal created with BIN2HEX(J).
The logic analyzer over-sampled the data considerably. I want to delete samples where the byte value did not change, keeping only the first sample in a series of sequential duplicates. In other words, I want to change this:
A B C D E F G H I J K
0.67497 1 0 0 1 1 1 1 0 10011110 9E
0.67498 1 0 0 1 1 1 0 1 10011101 9D
0.67499 1 0 0 1 1 1 0 1 10011101 9D
0.67500 1 0 0 1 1 1 0 1 10011101 9D
0.67501 1 0 0 1 1 1 1 0 10011110 9E
to this:
A B C D E F G H I J K
0.67497 1 0 0 1 1 1 1 0 10011110 9E
0.67498 1 0 0 1 1 1 0 1 10011101 9D
0.67501 1 0 0 1 1 1 1 0 10011110 9E
If I run the following code after selecting cell K1, it deletes the over-samples as I desire, but it runs very slowly. (It would take several days to finish.)
Sub DeleteOverSamples()
Do Until ActiveCell.Value = ""
If ActiveCell.Value = ActiveCell.Offset(-1, 0).Value Then
ActiveCell.EntireRow.Delete
ElseIf ActiveCell.Value <> ActiveCell.Offset(-1, 0).Value Then
ActiveCell.Offset(1, 0).Select
End If
Loop
End Sub
How can I make this more efficient? If the EntireRow.Delete is a time-consuming function, can I select multiple rows for deletion at a time (sometimes the repeated value repeats for hundreds of samples)? Many thanks!
Upvotes: 0
Views: 156
Reputation: 7567
Delete method is slow, and individual input/output of a cell's value is slow too.
Using a Variant
array is fast.
Sub test()
Dim vDB As Variant, vR() As Variant
Dim r As Long, c As Integer, n As Long, j As Integer
Dim s As String
vDB = Range("a1").CurrentRegion
r = UBound(vDB, 1)
c = UBound(vDB, 2)
s = vDB(1, 11)
n = n + 1
ReDim Preserve vR(1 To r, 1 To c)
For j = 1 To c
vR(n, j) = vDB(1, j)
Next j
For i = 1 To r
If s <> vDB(i, 11) Then
n = n + 1
For j = 1 To c
vR(n, j) = vDB(i, j)
Next j
s = vDB(i, 11)
End If
Next i
Sheets.Add
Range("a1").Resize(n, c) = vR
End Sub
Upvotes: 3
Reputation: 23974
The following code will create a new sheet, copying the relevant values from the first sheet:
Sub test()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Dim ws0 As Worksheet
Dim ws1 As Worksheet
Dim r0 As Long
Dim r1 As Long
Dim c As Long
Dim startTime As Single
startTime = Timer
Set ws0 = ActiveSheet
Set ws1 = Worksheets.Add
r0 = 1
r1 = 1
Do While Not IsEmpty(ws0.Cells(r0, 1).Value)
If r0 = 1 Then
ws1.Rows(r1).Range("A1:I1").Value = ws0.Rows(r0).Range("A1:I1").Value
r1 = r1 + 1
Else
For c = 2 To 9
If ws0.Cells(r0, c).Value <> ws0.Cells(r0 - 1, c).Value Then
ws1.Rows(r1).Range("A1:I1").Value = ws0.Rows(r0).Range("A1:I1").Value
r1 = r1 + 1
End If
Exit For
Next
End If
r0 = r0 + 1
Loop
MsgBox "Finished in " & (Timer - startTime) & " seconds"
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
End Sub
I tested this using the data you provided for the first few rows, and then replicating your last row for the next 499995 rows (except with column B being randomly selected between either 0
or 1
) and it took slightly over 20 seconds to copy approximately 250,000 rows of data. Without the random effect in column B, it took just over 19 seconds to copy the 3 rows that you would expect. A random effect in column I, instead of column B, took just over 28 seconds - which is probably the slowest it will go.
(It would be faster if it made use of your calculated column J or K, as it would only need to look at one cell each row instead of the 8 cells it currently looks at, but I wasn't sure whether you actually needed those columns or whether you only added them to make your existing code easier.)
Upvotes: 2