Ratigan
Ratigan

Reputation: 3

How to delete an entire row in Excel if same cell in preceding row is identical?

My excel spreadsheet contains 11 columns and 500k rows. Each row is a sample from an 8-channel digital logic analyzer: column A is a time stamp; columns B through I are bit values (either a 1 or 0 in each cell); column J is a byte in binary created with CONCATENATE(B,C,D,E,F,G,H,I); and column K is that same byte in hexadecimal created with BIN2HEX(J).

The logic analyzer over-sampled the data considerably. I want to delete samples where the byte value did not change, keeping only the first sample in a series of sequential duplicates. In other words, I want to change this:

A        B  C  D  E  F  G  H  I  J         K
0.67497  1  0  0  1  1  1  1  0  10011110  9E
0.67498  1  0  0  1  1  1  0  1  10011101  9D
0.67499  1  0  0  1  1  1  0  1  10011101  9D
0.67500  1  0  0  1  1  1  0  1  10011101  9D
0.67501  1  0  0  1  1  1  1  0  10011110  9E

to this:

A        B  C  D  E  F  G  H  I  J         K
0.67497  1  0  0  1  1  1  1  0  10011110  9E
0.67498  1  0  0  1  1  1  0  1  10011101  9D
0.67501  1  0  0  1  1  1  1  0  10011110  9E

If I run the following code after selecting cell K1, it deletes the over-samples as I desire, but it runs very slowly. (It would take several days to finish.)

Sub DeleteOverSamples()
  Do Until ActiveCell.Value = ""
    If ActiveCell.Value = ActiveCell.Offset(-1, 0).Value Then
      ActiveCell.EntireRow.Delete
    ElseIf ActiveCell.Value <> ActiveCell.Offset(-1, 0).Value Then
      ActiveCell.Offset(1, 0).Select
    End If
  Loop
End Sub

How can I make this more efficient? If the EntireRow.Delete is a time-consuming function, can I select multiple rows for deletion at a time (sometimes the repeated value repeats for hundreds of samples)? Many thanks!

Upvotes: 0

Views: 156

Answers (2)

Dy.Lee
Dy.Lee

Reputation: 7567

Delete method is slow, and individual input/output of a cell's value is slow too. Using a Variant array is fast.

Sub test()
    Dim vDB As Variant, vR() As Variant
    Dim r As Long, c As Integer, n As Long, j As Integer
    Dim s As String

    vDB = Range("a1").CurrentRegion
    r = UBound(vDB, 1)
    c = UBound(vDB, 2)

    s = vDB(1, 11)

    n = n + 1
    ReDim Preserve vR(1 To r, 1 To c)
    For j = 1 To c
        vR(n, j) = vDB(1, j)
    Next j

    For i = 1 To r
        If s <> vDB(i, 11) Then
            n = n + 1
            For j = 1 To c
                vR(n, j) = vDB(i, j)
            Next j
            s = vDB(i, 11)
        End If
    Next i
    Sheets.Add
    Range("a1").Resize(n, c) = vR

End Sub

Upvotes: 3

YowE3K
YowE3K

Reputation: 23974

The following code will create a new sheet, copying the relevant values from the first sheet:

Sub test()
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual
    Dim ws0 As Worksheet
    Dim ws1 As Worksheet
    Dim r0 As Long
    Dim r1 As Long
    Dim c As Long
    Dim startTime As Single
    startTime = Timer

    Set ws0 = ActiveSheet
    Set ws1 = Worksheets.Add
    r0 = 1
    r1 = 1
    Do While Not IsEmpty(ws0.Cells(r0, 1).Value)
        If r0 = 1 Then
            ws1.Rows(r1).Range("A1:I1").Value = ws0.Rows(r0).Range("A1:I1").Value
            r1 = r1 + 1
        Else
            For c = 2 To 9
                If ws0.Cells(r0, c).Value <> ws0.Cells(r0 - 1, c).Value Then
                    ws1.Rows(r1).Range("A1:I1").Value = ws0.Rows(r0).Range("A1:I1").Value
                    r1 = r1 + 1
                End If
                Exit For
            Next
        End If
        r0 = r0 + 1
    Loop

    MsgBox "Finished in " & (Timer - startTime) & " seconds"
    Application.Calculation = xlCalculationAutomatic
    Application.ScreenUpdating = True
End Sub

I tested this using the data you provided for the first few rows, and then replicating your last row for the next 499995 rows (except with column B being randomly selected between either 0 or 1) and it took slightly over 20 seconds to copy approximately 250,000 rows of data. Without the random effect in column B, it took just over 19 seconds to copy the 3 rows that you would expect. A random effect in column I, instead of column B, took just over 28 seconds - which is probably the slowest it will go.

(It would be faster if it made use of your calculated column J or K, as it would only need to look at one cell each row instead of the 8 cells it currently looks at, but I wasn't sure whether you actually needed those columns or whether you only added them to make your existing code easier.)

Upvotes: 2

Related Questions