Reputation: 3
I have a data set in excel with measurements taken every 15 minutes (one measurement one row), where I would like to keep only the ones taken on a full hour and delete the rest. In other words: I would like to keep the first row, delete the next three, keep the 5th row, delete the next three and so on. I would like to use VBA but I'm completely new to this. I've found this macro here, which deletes every four rows
Sub remove_rows()
Dim x As Long
Application.ScreenUpdating = False
For x = 100 To 1 Step -5
Range(x & ":" & x - 3).EntireRow.Delete
Next x
Application.ScreenUpdating = True
End Sub
(Source: Delete every four rows in excel) How do I change it to delete only every three rows? I have to do the same with a dataset with measurements taken every 5 minutes (keep the 1 row, delete the next eleven, keep the 12th and so on). Is this macro good for this dataset as well? And lastly - is using VBA the best solution for this problem or is there another method which is better? The datasets are fairly big (100k+ rows).
Upvotes: 0
Views: 4726
Reputation: 84465
Powerquery (2016 in the data tab > Get and Transform, 2013 free add-in from microsoft then powerquery tab) is absolutely optimized for this sort of operation over the number of rows you mention. It takes < 1 minute to write the query.
.66 of a second on a test with 200K rows to complete task.
1) Select row in data, then data > from table
2) Indicate whether you table has headers
3) Query editor screen pops up
4) Select Home > Remove rows > Remove alternate row
5) Specify pattern. For example, keep 1 remove 3 as in your example:
6) Hit Ok. Observe new pattern fits requirement
7) Home > Close and Load >
8) Specify where to close and load to e.g. new sheet
You can load to same sheet or new sheet. Bingo you have your new data set. Anytime you add rows to your original dataset you can simply refresh this query and the result set will update removing the unwanted rows.
Output:
Refresh query (green circling arrows):
1) Next to workbook query itself
2) Or from ribbon > Refresh
Upvotes: 1
Reputation: 10715
1st version bellow (AutoFilter
) is very fast - 2 seconds, for 100 K rows
To change the row interval, update the 4
in Const FRM
(formula)
.
Version 1 - using an AuroFilter
Option Explicit
'Deleted Rows: 75,000 (out of 100,000) - Time: 2.341 sec
Public Sub DeleteRowSetsAutoFilter()
Const FRM = "=MOD(ROW() - 1, 4) = 0" 'Rows where reminder of Row/4 = 0
Dim ws1 As Worksheet, ws2 As Worksheet, wsName As String, fc As Range
Application.ScreenUpdating = False
Application.DisplayAlerts = False
Set ws1 = ActiveSheet
Set ws2 = Sheets.Add(After:=ws1) 'Add new sheet
wsName = ws1.Name
Set fc = ws1.UsedRange.Columns(ws1.UsedRange.Columns.Count + 1) 'Filter column
fc.Formula = FRM
fc.AutoFilter Field:=1, Criteria1:="TRUE" 'Rows to be deleted: 2 To 4, 6 To 8, ...
ws1.UsedRange.Copy 'Copy visible rows to new sheet
ws2.Cells.PasteSpecial xlPasteColumnWidths
ws2.Cells.PasteSpecial xlPasteAll 'Paste data on new sheet
ws1.Delete 'Delete old sheet
ws2.Name = wsName
ws2.Cells(1).Select
ws2.Columns(ws2.UsedRange.Columns.Count).EntireColumn.Delete 'Delete filter column
Application.ScreenUpdating = True
Application.DisplayAlerts = True
Application.CutCopyMode = False
End Sub
.
Version 2 - using a For
loop
Public Sub DeleteRowSetsForLoop()
Const STP = 4 'Row interval
Dim ws As Worksheet, lr As Long, i As Long, toDel As Range
Set ws = ActiveSheet
lr = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
Set toDel = ws.Rows(lr + 1) 'First empty row (just to set the range)
For i = 1 To lr Step STP
Set toDel = Union(toDel, ws.Rows(i + 1 & ":" & i + (STP - 1))) '2-4, 6-8, etc.
Next
toDel.EntireRow.Delete
End Sub
.
Rows: 2,500 (out of 10 K)
DeleteRowSetsAutoFilter() - Time: 0.085 sec, 0.086 sec, 0.089 sec
DeleteRowSetsForLoop() - Time: 9.568 sec, 9.524 sec, 9.530 sec
Upvotes: 0
Reputation: 343
I think the code above is a little dangerous and will require tweaking for a different total number of rows. You could modify as follows:
For x = 100 To 1 Step -4
Range(x & ":" & x - 2).EntireRow.Delete
Next x
But you have to make sure you're starting at the right place and preserving the right cells. The Step -4 steps back four cells at a time, then the following line deletes row x, x-1, and x-2.
Why not just create a column that indicates whether the observation ends at an hour, then sort the list by that column and delete everything after the transition point? It's less automated, but also less likely to cause a problem.
If you really wanted to go the VBA route, I'd check in the code to ensure the observation is hourly and only then delete. I'm not big into trusting my data, though.
Upvotes: 0