Reputation: 13
I have two sets of data that need to be matched based on IDs and timestamp (+/- 3 units converted from time), and below is the formula that I've been using in Excel to do the matching. Recently I've had to run this formula on up to 1 million rows in Excel, and it takes a REALLY long time, crashes too. I'm wondering if there is a faster way to do this, if not in Excel?
=INDEX(A:A,MATCH(1,--(B:B=E3)*--(ABS(C:C-F3)<=3),0),1)
Data Set 1: Column A: States Column B: IDs Column C: Timestamp
Data Set 2: Column D: Email Addresses Column E: IDs Column F: Timestamp
Column G: =INDEX(A:A,MATCH(1,--(B:B=E3)*--(ABS(C:C-F3)<=3),0),1)
Goal: Append "States" Column to Data Set 2 matched on IDs and Timestamp (+/- 3 time units) match.
Just don't know how to run this formula on very large data sets.
Upvotes: 1
Views: 1226
Reputation: 14764
Place the following VBA routines in a standard code module.
Run the MIAB1290() routine.
This emulates the precise outcome of your INDEX/MATCH formula, but it is much more efficient. On my computer, a million records are correctly correlated and the results displayed in Column G in just 10 seconds.
Public Sub MIAB1290()
Dim lastB&, k&, e, f, z, v, w, vErr, r As Range
With [a2]
Set r = .Resize(.Item(.Parent.Rows.Count - .Row + 1, 5).End(xlUp).Row - .Row + 1, .Item(, .Parent.Columns.Count - .Column + 1).End(xlToLeft).Column - .Column + 1)
lastB = .Item(.Parent.Rows.Count - .Row + 1, 2).End(xlUp).Row - .Row + 1
End With
With r
.Worksheet.Sort.SortFields.Clear
.Sort Key1:=.Item(1, 2), Order1:=1, Key2:=.Item(1, 2), Order2:=1, Header:=xlYes
v = .Value2
End With
ReDim w(1 To UBound(v), 1 To 1)
vErr = CVErr(xlErrNA)
For k = 2 To UBound(v)
e = v(k, 5)
f = v(k, 6)
w(k, 1) = vErr
z = BSearch(v, 2, e, 1, lastB)
If z Then
Do While v(z, 2) = e
If Abs(v(z, 3) - f) <= 3 Then
w(k, 1) = v(z, 1)
Exit Do
End If
z = z + 1
If z > UBound(v) Then Exit Do
Loop
End If
Next
r(1, 8).Resize(r.Rows.Count) = w
End Sub
Private Function BSearch(vA, col&, vVal, ByVal first&, ByVal last&)
Dim k&, middle&
While last >= first
middle = (last + first) / 2
Select Case True
Case vVal < vA(middle, col)
last = middle - 1
Case vVal > vA(middle, col)
first = middle + 1
Case Else
k = middle - 1
Do While vA(k, col) = vA(middle, col)
k = k - 1
If k > last Then Exit Do
Loop
BSearch = k + 1
Exit Function
End Select
Wend
BSearch = 0
End Function
Upvotes: 1
Reputation: 67
Excel isn't really made for large ammount of data, and probably no code will do it faster for you then a builtin excel formula. In this case, I would sugest you to give a try to the PowerPivot addin, and see how it handles the situation.
Upvotes: 0