Reputation: 10715
I am trying to find a way to filter large data and remove rows in a worksheet, in less than one minute
The goal:
.
Test Data:
:
.
How the code works:
If the workbook is not empty and the text value to be removed exists in column 1
When it finds a match:
"A11,A275,A3900,..."
.Range("A11,A275,A3900,...").EntireRow.Delete Shift:=xlUp
.
The main issue is the Delete operation, and total duration time should be under one minute. Any code-based solution is acceptable as long as it performs under 1 minute.
This narrows the scope to very few acceptable answers. The answers already provided are also very short and easy to implement. One performs the operation in about 30 seconds, so there is at least one answer that provides an acceptable solution, and other may find it useful as well
.
My main initial function:
Sub DeleteRowsWithValuesStrings()
Const MAX_SZ As Byte = 240
Dim i As Long, j As Long, t As Double, ws As Worksheet
Dim memArr As Variant, max As Long, tmp As String
Set ws = Worksheets(1)
max = GetMaxCell(ws.UsedRange).Row
FastWB True: t = Timer
With ws
If max > 1 Then
If IndexOfValInRowOrCol("Test String", , ws.UsedRange) > 0 Then
memArr = .Range(.Cells(1, 1), .Cells(max, 1)).Value2
For i = max To 1 Step -1
If memArr(i, 1) = "Test String" Then
tmp = tmp & "A" & i & ","
If Len(tmp) > MAX_SZ Then
.Range(Left(tmp, Len(tmp) - 1)).EntireRow.Delete Shift:=xlUp
tmp = vbNullString
End If
End If
Next
If Len(tmp) > 0 Then
.Range(Left(tmp, Len(tmp) - 1)).EntireRow.Delete Shift:=xlUp
End If
.Calculate
End If
End If
End With
FastWB False: InputBox "Duration: ", "Duration", Timer - t
End Sub
Helper functions (turn Excel features off and on):
Public Sub FastWB(Optional ByVal opt As Boolean = True)
With Application
.Calculation = IIf(opt, xlCalculationManual, xlCalculationAutomatic)
.DisplayAlerts = Not opt
.DisplayStatusBar = Not opt
.EnableAnimations = Not opt
.EnableEvents = Not opt
.ScreenUpdating = Not opt
End With
FastWS , opt
End Sub
Public Sub FastWS(Optional ByVal ws As Worksheet = Nothing, _
Optional ByVal opt As Boolean = True)
If ws Is Nothing Then
For Each ws In Application.ActiveWorkbook.Sheets
EnableWS ws, opt
Next
Else
EnableWS ws, opt
End If
End Sub
Private Sub EnableWS(ByVal ws As Worksheet, ByVal opt As Boolean)
With ws
.DisplayPageBreaks = False
.EnableCalculation = Not opt
.EnableFormatConditionsCalculation = Not opt
.EnablePivotTable = Not opt
End With
End Sub
Finds last cell with data (thanks @ZygD - now I tested it in several scenarios):
Public Function GetMaxCell(Optional ByRef rng As Range = Nothing) As Range
'Returns the last cell containing a value, or A1 if Worksheet is empty
Const NONEMPTY As String = "*"
Dim lRow As Range, lCol As Range
If rng Is Nothing Then Set rng = Application.ActiveWorkbook.ActiveSheet.UsedRange
If WorksheetFunction.CountA(rng) = 0 Then
Set GetMaxCell = rng.Parent.Cells(1, 1)
Else
With rng
Set lRow = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _
After:=.Cells(1, 1), _
SearchDirection:=xlPrevious, _
SearchOrder:=xlByRows)
If Not lRow Is Nothing Then
Set lCol = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _
After:=.Cells(1, 1), _
SearchDirection:=xlPrevious, _
SearchOrder:=xlByColumns)
Set GetMaxCell = .Parent.Cells(lRow.Row, lCol.Column)
End If
End With
End If
End Function
Returns the index of a match in the array, or 0 if a match is not found:
Public Function IndexOfValInRowOrCol( _
ByVal searchVal As String, _
Optional ByRef ws As Worksheet = Nothing, _
Optional ByRef rng As Range = Nothing, _
Optional ByRef vertical As Boolean = True, _
Optional ByRef rowOrColNum As Long = 1 _
) As Long
'Returns position in Row or Column, or 0 if no matches found
Dim usedRng As Range, result As Variant, searchRow As Long, searchCol As Long
result = CVErr(9999) '- generate custom error
Set usedRng = GetUsedRng(ws, rng)
If Not usedRng Is Nothing Then
If rowOrColNum < 1 Then rowOrColNum = 1
With Application
If vertical Then
result = .Match(searchVal, rng.Columns(rowOrColNum), 0)
Else
result = .Match(searchVal, rng.Rows(rowOrColNum), 0)
End If
End With
End If
If IsError(result) Then IndexOfValInRowOrCol = 0 Else IndexOfValInRowOrCol = result
End Function
.
Update:
Tested 6 solutions (3 tests each): Excel Hero's solution is the fastest so far (removes formulas)
.
Here are the results, fastest to the slowest:
.
Test 1. Total of 100,000 records, 10,000 to be deleted:
1. ExcelHero() - 1.5 seconds
2. DeleteRowsWithValuesNewSheet() - 2.4 seconds
3. DeleteRowsWithValuesStrings() - 2.45 minutes
4. DeleteRowsWithValuesArray() - 2.45 minutes
5. QuickAndEasy() - 3.25 minutes
6. DeleteRowsWithValuesUnion() - Stopped after 5 minutes
.
Test 2. Total of 1 million records, 100,000 to be deleted:
1. ExcelHero() - 16 seconds (average)
2. DeleteRowsWithValuesNewSheet() - 33 seconds (average)
3. DeleteRowsWithValuesStrings() - 4 hrs 38 min (16701.375 sec)
4. DeleteRowsWithValuesArray() - 4 hrs 37 min (16626.3051757813 sec)
5. QuickAndEasy() - 5 hrs 40 min (20434.2104492188 sec)
6. DeleteRowsWithValuesUnion() - N/A
.
Notes:
I also made the test data more realistic by introducing unusual values:
Upvotes: 39
Views: 43849
Reputation:
I know I'm incredibly late with my answer here however future visitors may find it very useful.
Please Note: My approach requires an index column for the rows to end up in the original order, however if you do not mind the rows being in a different order then an index column isn't needed and the additional line of code can be removed.
My approach: My approach was to simply select all the rows in the selected range (column), sort them in ascending order using Range.Sort
and then collecting the first and last index of "Test String"
within the selected range (column). I then create a range from the first and last indices and use Range.EntrieRow.Delete
to remove all the rows which contain "Test String"
.
Pros:
- It is blazing fast.
- It doesn't remove formatting, formulas, charts, pictures or anything like the method which copies to a new sheet.
Cons:
- A decent size of code to implement however it is all straight-forward.
Test Range Generation Sub:
Sub DevelopTest()
Dim index As Long
FastWB True
ActiveSheet.UsedRange.Clear
For index = 1 To 1000000 '1 million test
ActiveSheet.Cells(index, 1).Value = index
If (index Mod 10) = 0 Then
ActiveSheet.Cells(index, 2).Value = "Test String"
Else
ActiveSheet.Cells(index, 2).Value = "Blah Blah Blah"
End If
Next index
Application.StatusBar = ""
FastWB False
End Sub
Filter And Delete Rows Sub:
Sub DeleteRowFast()
Dim curWorksheet As Worksheet 'Current worksheet vairable
Dim rangeSelection As Range 'Selected range
Dim startBadVals As Long 'Start of the unwanted values
Dim endBadVals As Long 'End of the unwanted values
Dim strtTime As Double 'Timer variable
Dim lastRow As Long 'Last Row variable
Dim lastColumn As Long 'Last column variable
Dim indexCell As Range 'Index range start
Dim sortRange As Range 'The range which the sort is applied to
Dim currRow As Range 'Current Row index for the for loop
Dim cell As Range 'Current cell for use in the for loop
On Error GoTo Err
Set rangeSelection = Application.InputBox("Select the (N=) range to be checked", "Get Range", Type:=8) 'Get the desired range from the user
Err.Clear
M1 = MsgBox("This is recommended for large files (50,000 or more entries)", vbYesNo, "Enable Fast Workbook?") 'Prompt the user with an option to enable Fast Workbook, roughly 150% performace gains... Recommended for incredibly large files
Select Case M1
Case vbYes
FastWB True 'Enable fast workbook
Case vbNo
FastWB False 'Disable fast workbook
End Select
strtTime = Timer 'Begin the timer
Set curWorksheet = ActiveSheet
lastRow = CLng(rangeSelection.SpecialCells(xlCellTypeLastCell).Row)
lastColumn = curWorksheet.Cells(1, 16384).End(xlToLeft).Column
Set indexCell = curWorksheet.Cells(1, 1)
On Error Resume Next
If rangeSelection.Rows.Count > 1 Then 'Check if there is anything to do
lastVisRow = rangeSelection.Rows.Count
Set sortRange = curWorksheet.Range(indexCell, curWorksheet.Cells(curWorksheet.Rows(lastRow).Row, 16384).End(xlToLeft)) 'Set the sort range
sortRange.Sort Key1:=rangeSelection.Cells(1, 1), Order1:=xlAscending, Header:=xlNo 'Sort by values, lowest to highest
startBadVals = rangeSelection.Find(What:="Test String", LookAt:=xlWhole, MatchCase:=False).Row
endBadVals = rangeSelection.Find(What:="Test String", LookAt:=xlWhole, SearchDirection:=xlPrevious, MatchCase:=False).Row
curWorksheet.Range(curWorksheet.Rows(startBadVals), curWorksheet.Rows(endBadVals)).EntireRow.Delete 'Delete uneeded rows, deleteing in continuous range blocks is quick than seperated or individual deletions.
sortRange.Sort Key1:=indexCell, Order1:=xlAscending, Header:=xlNo 'Sort by index instead of values, lowest to highest
End If
Application.StatusBar = "" 'Reset the status bar
FastWB False 'Disable fast workbook
MsgBox CStr(Round(Timer - strtTime, 2)) & "s" 'Display duration of task
Err:
Exit Sub
End Sub
THIS CODE USES FastWB
, FastWS
AND EnableWS
BY Paul Bica!
Times at 100K entries (10k to be removed, FastWB True):
1. 0.2 seconds.
2. 0.2 seconds.
3. 0.21 seconds.
Avg. 0.2 seconds.
Times at 1 million entries (100k to be removed, FastWB True):
1. 2.3 seconds.
2. 2.32 seconds.
3. 2.3 seconds.
Avg. 2.31 seconds.
Running on: Windows 10, iMac i3 11,2 (From 2010)
EDIT
This code was originally designed with the purpose of filtering out numeric values outside of a numeric range and has been adapted to filter out "Test String"
so some of the code may be redundant.
Upvotes: 2
Reputation: 14764
A significant gain in speed can be achieved if the source data do not contain formulas, or if the scenario would allow (or want) the formulas to be converted into hard values during the conditional row deletions.
With the above as a caveat, my solution uses the AdvancedFilter of the range object. It's about twice as fast as DeleteRowsWithValuesNewSheet().
Public Sub ExcelHero()
Dim t#, crit As Range, data As Range, ws As Worksheet
Dim r&, fc As Range, lc As Range, fr1 As Range, fr2 As Range
FastWB True
t = Timer
Set fc = ActiveSheet.UsedRange.Item(1)
Set lc = GetMaxCell
Set data = ActiveSheet.Range(fc, lc)
Set ws = Sheets.Add
With data
Set fr1 = data.Worksheet.Range(fc, fc.Offset(, lc.Column))
Set fr2 = ws.Range(ws.Cells(fc.Row, fc.Column), ws.Cells(fc.Row, lc.Column))
With fr2
fr1.Copy
.PasteSpecial xlPasteColumnWidths: .PasteSpecial xlPasteAll
.Item(1).Select
End With
Set crit = .Resize(2, 1).Offset(, lc.Column + 1)
crit = [{"Column 1";"<>Test String"}]
.AdvancedFilter xlFilterCopy, crit, fr2
.Worksheet.Delete
End With
FastWB False
r = ws.UsedRange.Rows.Count
Debug.Print "Rows: " & r & ", Duration: " & Timer - t & " seconds"
End Sub
Upvotes: 14
Reputation: 10715
I'm providing the first answer as a reference
Others may find it useful, if there are no other options available
.
Sub DeleteRowsWithValuesNewSheet() '100K records 10K to delete
'Test 1: 2.40234375 sec
'Test 2: 2.41796875 sec
'Test 3: 2.40234375 sec
'1M records 100K to delete
'Test 1: 32.9140625 sec
'Test 2: 33.1484375 sec
'Test 3: 32.90625 sec
Dim oldWs As Worksheet, newWs As Worksheet, rowHeights() As Long
Dim wsName As String, t As Double, oldUsedRng As Range
FastWB True: t = Timer
Set oldWs = Worksheets(1)
wsName = oldWs.Name
Set oldUsedRng = oldWs.Range("A1", GetMaxCell(oldWs.UsedRange))
If oldUsedRng.Rows.Count > 1 Then 'If sheet is not empty
Set newWs = Sheets.Add(After:=oldWs) 'Add new sheet
With oldUsedRng
.AutoFilter Field:=1, Criteria1:="<>Test String"
.Copy 'Copy visible data
End With
With newWs.Cells
.PasteSpecial xlPasteColumnWidths
.PasteSpecial xlPasteAll 'Paste data on new sheet
.Cells(1, 1).Select 'Deselect paste area
.Cells(1, 1).Copy 'Clear Clipboard
End With
oldWs.Delete 'Delete old sheet
newWs.Name = wsName
End If
FastWB False: InputBox "Duration: ", "Duration", Timer - t
End Sub
.
At high level:
.AutoFilter Field:=1, Criteria1:="<>Test String"
It uses the same helper functions posted in the question
The 99% of the duration is used by the AutoFilter
.
There are a couple limitations I found so far, the first can be addressed:
If there are any hidden rows on the initial sheet, it unhides them
VBA related:
.
A few notes about using large files like this:
Unmanaged Conditional Formatting rules can cause exponential performance issues
Reading file or data from network is much slower than working with a locall file
Upvotes: 16
Reputation: 96753
On my elderly Dell Inspiron 1564 (Win 7 Office 2007) this:
Sub QuickAndEasy()
Dim rng As Range
Set rng = Range("AA2:AA1000001")
Range("AB1") = Now
Application.ScreenUpdating = False
With rng
.Formula = "=If(A2=""Test String"",0/0,A2)"
.Cells.SpecialCells(xlCellTypeFormulas, xlErrors).EntireRow.Delete
.Clear
End With
Application.ScreenUpdating = True
Range("AC1") = Now
End Sub
took about 10 seconds to run. I am assuming that column AA is available.
EDIT#1:
Please note that this code does not set Calculation to Manual. Performance will improve if the Calculation mode is set to Manual after the "helper" column is allowed to calculate.
Upvotes: 6
Reputation: 46
Your use of arrays in calculating the used range and row count may effect the performance. Here's another approach which in testing proves efficient across 1m+ rows of data - between 25-30 seconds. It doesn't use filters so will delete rows even if hidden. Deleting a whole row won't effect formatting or column widths of the other remaining rows.
First, check if the ActiveSheet has "Test String". Since you're only interested in Column 1 I used this:
TCount = Application.WorksheetFunction.CountIf(sht.Columns(1), "Test String")
If TCount > 0 Then
Instead of using your GetMaxCell() function I simply used Cells.SpecialCells(xlCellTypeLastCell).Row
to get the last row:
EndRow = sht.Cells.SpecialCells(xlCellTypeLastCell).Row
Then loop through the rows of data:
While r <= EndRow
To test if the cell in Column 1 is equal to "Test String":
If sht.Cells(r, 1).Text) = "Test String" Then
To delete the row:
Rows(r).Delete Shift:=xlUp
Putting it all together full code below. I've set ActiveSheet to a variable Sht and added turned of ScreenUpdating to improve efficiency. Since it's a lot of data I make sure to clear variables at the end.
Sub RowDeleter()
Dim sht As Worksheet
Dim r As Long
Dim EndRow As Long
Dim TCount As Long
Dim s As Date
Dim e As Date
Application.ScreenUpdating = True
r = 2 'Initialise row number
s = Now 'Start Time
Set sht = ActiveSheet
EndRow = sht.Cells.SpecialCells(xlCellTypeLastCell).Row
'Check if "Test String" is found in Column 1
TCount = Application.WorksheetFunction.CountIf(sht.Columns(1), "Test String")
If TCount > 0 Then
'loop through to the End row
While r <= EndRow
If InStr(sht.Cells(r, 1).Text, "Test String") > 0 Then
sht.Rows(r).Delete Shift:=xlUp
r = r - 1
End If
r = r + 1
Wend
End If
e = Now 'End Time
D = (Hour(e) * 360 + Minute(e) * 60 + Second(e)) - (Hour(s) * 360 + Minute(s) * 60 + Second(s))
Application.ScreenUpdating = True
DurationTime = TimeSerial(0, 0, D)
MsgBox Format(DurationTime, "hh:mm:ss")
End Sub
Upvotes: 0