odoland
odoland

Reputation: 79

Deleting entire row on criteria cannot handle 400,000 rows

I have this macro to delete the entire rows for those that are not "chr9". I have a total of 401,094 rows. It seems to compile fine, but my Excel freezes and I have to Force Quit.

I think it may be an inefficient algorithm or maybe some error in the code?

Sub deleteNonChr9()
    Dim lastrow As Long
    Dim firstrow As Long
    Dim i As Long

    lastrow = 401094
    firstrow = 0

    ' Increment bottom of sheet to upwards
    For i = lastrow To firstrow Step -1
        If (Range("C1").Offset(i, 0) <> "chr9") Then
            Range("C1").Offset(i, 0).EntireRow.Delete
        End If
    Next i

End Sub

Upvotes: 2

Views: 1074

Answers (3)

Anastasiya-Romanova 秀
Anastasiya-Romanova 秀

Reputation: 3368

Major Progress

The following code for dealing with deleting a very large number of rows is inspired by Ron de Bruin - Excel Automation.

Sub QuickDeleteRows()
Dim Sheet_Data As Worksheet, NewSheet_Data As Worksheet
Dim Sheet_Name As String, ZeroTime As Double, Data As Range

On Error GoTo Error_Handler
SpeedUp True

Set Sheet_Data = Sheets("Test")
Sheet_Name = Sheet_Data.Name
LastRow = Cells(Rows.Count, "A").End(xlUp).Row
LastColumn = Cells(1, Columns.Count).End(xlToLeft).Column

Set Data = Sheet_Data.Range("A1", Cells(LastRow, LastColumn))

Set NewSheet_Data = Sheets.Add(After:=Sheet_Data)

Data.AutoFilter Field:=3, Criteria1:="=Chr9"
Data.Copy

With NewSheet_Data.Cells
    .PasteSpecial xlPasteColumnWidths
    .PasteSpecial xlPasteAll
    .Cells(1, 1).Select
    .Cells(1, 1).Copy
End With

Sheet_Data.Delete
NewSheet_Data.Name = Sheet_Name

Safe_Exit:
    SpeedUp False
    Exit Sub
Error_Handler:
    Resume Safe_Exit
End Sub

Sub SpeedUp(SpeedUpOn As Boolean)
With Application
    If SpeedUpOn Then
        .ScreenUpdating = False
        .EnableEvents = False
        .Calculation = xlCalculationManual
        .DisplayStatusBar = False
        .DisplayAlerts = False
    Else
        .ScreenUpdating = True
        .EnableEvents = True
        .Calculation = xlCalculationAutomatic
        .DisplayStatusBar = True
        .DisplayAlerts = True
    End If
End With
End Sub

While my old version of code takes time rather long (about 130 seconds on average) for handling sample data provided by Jeeped, but the code above completes less than 4.6 seconds for handling 400,000 rows of sample data on my machine. It's a very significant gain in performance!

System Information of my PC (Very Minimum Computer Configurations for Students)

  • Operating System: Windows 7 Professional 32-bit (6.1, Build 7601) Service Pack 1
  • System Manufacturer: Hewlett-Packard
  • System Model: HP Pro 3330 MT
  • Processor: Intel(R) Core(TM) i3-2120 CPU @ 3.30GHz (4 CPUs), ~3.3GHz
  • Memory: 2048MB RAM

Original Answer

I'm aware that this answer is not really what the OP wants, but maybe this answer can be useful for other users and helpful to future users, if not the OP. Please see this answer as the alternative method.

Copy/paste, cut/insert, and delete entire row operations in Excel can take an excessively long time even when doing it in VBA Excel. For copy/paste and cut/insert operations the cause of the slowness is the formatting of the data itself. Memory over-allocation is another cause of those operations. So how do we resolve a situation like this? There are several things you can look for speeding up your code.

  1. Use arrays instead of the range of cells. It's usually considered to be faster than working on the range of cells because it ignores the formatting of the data in cells.
  2. Use .Value2 rather than the default property (.Value) because .Value2 will only treat all formatting numbers (currency, accounting, date, scientific, etc) as Doubles.

Suppose we have 10,000 rows of dummy data like the following dataset:

enter image description here

Instead of deleting entire rows of "non-chr9" data, I simply ignore those data and only consider the "chr9" data by copying all the "chr9" data into an array. How to code to implement such task? First of all, we must make a copy of our data to avoid data loss because we cannot undo all changes to recover the original data after running VBA Excel.

It seems you have done all the preparations needed. Now, we may start coding by first declaring every variable we need to the appropriate type of data.

Dim i As Long, j As Long, k As Long
Dim LastRow As Long, LastColumn As Long, LengthDataChr9 As Long

If you don't declare the variables, your code will run with those variables defaulting to the Variant type. While Variant can be enormously useful, but it can make your code slow. So, make sure each variable is declared with a sensible type. This is good programming practice and considerably faster.

Next, we determine all variables we will use to construct the size of arrays. We will need

LastRow = Cells(Rows.Count, "A").End(xlUp).Row
LastColumn = Cells(1, Columns.Count).End(xlToLeft).Column

LastRow and LastColumn are the row and column number of the last cell with data in one row or one column. Keep in mind, LastRow and LastColumn may not give you the desired row and column number if you are not setting them up properly or using a well-formatted data sheet. What I mean by a "well-formatted data sheet", is a worksheet with data that starts in cell A1 and the number of the rows in column A and columns in row 1 must be equal to the range of all data. In other words, the size of the range of all data must be equal to LastRowxLastColumn.

We also need the length of the array for storing all the "chr9" data. This can be done by counting all the "chr9" data using the following statement:

LengthDataChr9 = Application.CountIf(Columns("C"), "chr9")

We now know the size of the arrays and we can redimension it. Add the following code lines:

ReDim Data(1 To LastRow, 1 To LastColumn)
ReDim DataChr9(1 To LengthDataChr9, 1 To LastColumn)

We use ReDim instead of Dim because we use the dynamic arrays. VBA Excel has automatically declared the arrays defaulting to the Variant type, but they have no size yet. Next, we copy the data into the array Data by using statement

Data = Range("A1", Cells(LastRow, LastColumn)).Value2

We use .Value2 to improve the performance of the code (See speeding up tips point 2 above). Since the data has already copied to the array Data we may clear the worksheet data so we can use it to paste DataChr9.

Rows("1:" & Rows.Count).ClearContents

To clear everything (all contents, formats, etc.) on the worksheet, we may use Sheets("Sheet1").Cells.Clear or Sheet1.Cells.Clear. Next, we want the code to loop through the elements array Data in column 3 by using For ... Next statement because the desired data we're looking for are located there. If the element of array Data contains string "chr9" is found, the code then copying all the elements in the row where "chr9" is located into DataChr9. Again we use For ... Next statement. Here are the lines for implementing those procedures.

For i = 1 To UBound(Data)
    If Data(i, 3) = "chr9" Then
        j = j + 1
            For k = 1 To LastColumn
                DataChr9(j, k) = Data(i, k)
            Next k
    End If
Next i

where j = j + 1 is a counter for looping through the rows of DataChr9. The final step, we paste back all the elements of DataChr9 to the worksheet by adding this line to the code:

Range("A1", Cells(LengthDataChr9, LastColumn)) = DataChr9

and then you're done! Yay, finally!


OK, let's compile all the lines code above. We obtain

Sub DeleteNonChr9()
Dim i As Long, j As Long, k As Long
Dim LastRow As Long, LastColumn As Long, LengthDataChr9 As Long

LastRow = Cells(Rows.Count, "A").End(xlUp).Row
LastColumn = Cells(1, Columns.Count).End(xlToLeft).Column
LengthDataChr9 = Application.CountIf(Columns("C"), "chr9")

ReDim Data(1 To LastRow, 1 To LastColumn)
ReDim DataChr9(1 To LengthDataChr9, 1 To LastColumn)

Data = Range("A1", Cells(LastRow, LastColumn)).Value2
Rows("1:" & Rows.Count).ClearContents

For i = 1 To UBound(Data)
    If Data(i, 3) = "chr9" Then
        j = j + 1
            For k = 1 To LastColumn
                DataChr9(j, k) = Data(i, k)
            Next k
    End If
Next i

Range("A1", Cells(LengthDataChr9, LastColumn)) = DataChr9
End Sub

The performance of the code above is satisfying. It takes less than 0.5 seconds on average to complete the process of extracting all "chr9" data from 10,000 rows dummy data on my machine.

Upvotes: 1

user4039065
user4039065

Reputation:

The fastest way to conditionally delete rows is to have them all at the bottom of the data block. Sorting them into that position and deleting is faster than individual looping or even compiling a discontiguous Union of rows to delete.

When any group or cells is contiguous (i.e. all together) Excel does not have to work as hard to get rid of them. If they are at the bottom of the Worksheet.UsedRange property, Excel doesn't have to calculate what to fill the empty space with.

Your original code did not allow for a column header text label in row 1 but I will account for that. Modify to suit if you do not have one.

These will turn off the three primary parasites of computing power. Two have already been addressed in the comments and answers, the third Application.EnableEvents property can also make a valid contribution to Sub procedure efficiency whether you have event driven routines or not. See the helper Sub procedure at the bottom for details.

Sample data²: 500K rows of random data in A:Z. ~33% Chr9 in column C:C. Approximately 333K randomly discontiguous rows to delete.

chr9_before

Union and delete

Option Explicit

Sub deleteByUnion()
    Dim rw As Long, dels As Range

    On Error GoTo bm_Safe_Exit
    appTGGL bTGGL:=False          'disable parasitic environment

    With Worksheets("Sheet1")
        Set dels = .Cells(.Rows.Count, "C").End(xlUp).Offset(1)
        For rw = .Cells(.Rows.Count, "C").End(xlUp).Row To 2 Step -1
            If LCase$(.Cells(rw, "C").Value2) <> "chr9" Then
                Set dels = Union(dels, .Cells(rw, "C"))
            End If
        Next rw
        If Not dels Is Nothing Then
            dels.EntireRow.Delete
        End If
    End With

bm_Safe_Exit:
    appTGGL

End Sub

Elapsed time: <It has been 20 minutes... I'll update this when it finishes...>

Bulk load from worksheet to variant array, change, load back, sort and delete

Sub deleteByArrayAndSort()
    Dim v As Long, vals As Variant

    On Error GoTo bm_Safe_Exit
    appTGGL bTGGL:=False          'disable parasitic environment

    With Worksheets("Sheet1")
        With .Cells(1, 1).CurrentRegion
            .EntireRow.Hidden = False
            With .Resize(.Rows.Count - 1, .Columns.Count).Offset(1, 0)
               'bulk load column C values
                vals = .Columns(3).Value2

               'change non-Chr9 values into vbNullStrings
                For v = LBound(vals, 1) To UBound(vals, 1)
                    If LCase$(vals(v, 1)) <> "chr9" Then _
                      vals(v, 1) = vbNullString
                Next v

            End With

           'dump revised array back into column C
            .Cells(2, "C").Resize(UBound(vals, 1), UBound(vals, 2)) = vals

            'sort all of blank C's to the bottom
            .Cells.Sort Key1:=.Columns(3), Order1:=xlAscending, _
                               Orientation:=xlTopToBottom, Header:=xlYes

            'delete non-Chr9 contiguous rows at bottom of currentregion
            .Range(.Cells(.Rows.Count, "C").End(xlUp), .Cells(.Rows.Count, "C")).EntireRow.Delete

        End With
        .UsedRange   'reset the last_cell property
    End With

bm_Safe_Exit:
    appTGGL

End Sub

Elapsed time: 11.61 seconds¹
       (166,262 rows of data remaining²)

Original code

Elapsed time: <still waiting...>

Summary

There are obvious advantages to working within a variant array as well as deleting contiguous ranges. My sample data had ~66% of the rows to delete so it was a harsh task master. If there were 5 or 20 rows to delete, using an array to parse data for a sort may not be the best solution. You will have to make your own decisions based on your own data.

chr9 after

appTGGL helper Sub procedure

Public Sub appTGGL(Optional bTGGL As Boolean = True)
    With Application
        .ScreenUpdating = bTGGL
        .EnableEvents = bTGGL
        .Calculation = IIf(bTGGL, xlCalculationAutomatic, xlCalculationManual)
    End With
    Debug.Print Timer
End Sub

¹ Environment: old business class laptop with a mobile i5 and 8gbs of DRAM running WIN7 and Office 2013 (version 15.0.4805.1001 MSO 15.0.4815.1000 32-bit) - typical of the low end of the scale for performing this level of procedure.

² Sample data temporarily available at Deleting entire row cannot handle 400,000 rows.xlsb.

Upvotes: 1

user6432984
user6432984

Reputation:

Toggling ScreenUpdating and Calculation will help. But as Jeeped stated, applying a custom sort order is the way to go.

Sub deleteNonChr9()
    Dim lastrow As Long
    Dim firstrow As Long
    Dim i As Long

    lastrow = 401094
    firstrow = 1

    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual

    ' Increment bottom of sheet to upwards
    For i = lastrow To firstrow Step -1

        If (Cells(i, "C") <> "chr9") Then
            Rows(i).EntireRow.Delete
        End If

    Next i

    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic

End Sub

Upvotes: 1

Related Questions