MIAB1290
MIAB1290

Reputation: 13

INDEX MATCH array formula for 1M rows

I have two sets of data that need to be matched based on IDs and timestamp (+/- 3 units converted from time), and below is the formula that I've been using in Excel to do the matching. Recently I've had to run this formula on up to 1 million rows in Excel, and it takes a REALLY long time, crashes too. I'm wondering if there is a faster way to do this, if not in Excel?

=INDEX(A:A,MATCH(1,--(B:B=E3)*--(ABS(C:C-F3)<=3),0),1)

Data Set 1: Column A: States Column B: IDs Column C: Timestamp

Data Set 2: Column D: Email Addresses Column E: IDs Column F: Timestamp

Column G: =INDEX(A:A,MATCH(1,--(B:B=E3)*--(ABS(C:C-F3)<=3),0),1)

Goal: Append "States" Column to Data Set 2 matched on IDs and Timestamp (+/- 3 time units) match.

Just don't know how to run this formula on very large data sets.

Upvotes: 1

Views: 1226

Answers (2)

Excel Hero
Excel Hero

Reputation: 14764

Place the following VBA routines in a standard code module.

Run the MIAB1290() routine.

This emulates the precise outcome of your INDEX/MATCH formula, but it is much more efficient. On my computer, a million records are correctly correlated and the results displayed in Column G in just 10 seconds.

Public Sub MIAB1290()

    Dim lastB&, k&, e, f, z, v, w, vErr, r As Range

    With [a2]
        Set r = .Resize(.Item(.Parent.Rows.Count - .Row + 1, 5).End(xlUp).Row - .Row + 1, .Item(, .Parent.Columns.Count - .Column + 1).End(xlToLeft).Column - .Column + 1)
        lastB = .Item(.Parent.Rows.Count - .Row + 1, 2).End(xlUp).Row - .Row + 1
    End With

    With r
        .Worksheet.Sort.SortFields.Clear
        .Sort Key1:=.Item(1, 2), Order1:=1, Key2:=.Item(1, 2), Order2:=1, Header:=xlYes
        v = .Value2
    End With

    ReDim w(1 To UBound(v), 1 To 1)
    vErr = CVErr(xlErrNA)

    For k = 2 To UBound(v)
        e = v(k, 5)
        f = v(k, 6)
        w(k, 1) = vErr
        z = BSearch(v, 2, e, 1, lastB)
        If z Then
            Do While v(z, 2) = e
                If Abs(v(z, 3) - f) <= 3 Then
                    w(k, 1) = v(z, 1)
                    Exit Do
                End If
                z = z + 1
                If z > UBound(v) Then Exit Do
            Loop
        End If
    Next

    r(1, 8).Resize(r.Rows.Count) = w

End Sub


Private Function BSearch(vA, col&, vVal, ByVal first&, ByVal last&)
    Dim k&, middle&
    While last >= first
        middle = (last + first) / 2
        Select Case True
            Case vVal < vA(middle, col)
                last = middle - 1
            Case vVal > vA(middle, col)
                first = middle + 1
            Case Else
                k = middle - 1
                Do While vA(k, col) = vA(middle, col)
                    k = k - 1
                    If k > last Then Exit Do
                Loop
                BSearch = k + 1
                Exit Function
        End Select
    Wend
    BSearch = 0
End Function

Upvotes: 1

Israel Holetz
Israel Holetz

Reputation: 67

Excel isn't really made for large ammount of data, and probably no code will do it faster for you then a builtin excel formula. In this case, I would sugest you to give a try to the PowerPivot addin, and see how it handles the situation.

Upvotes: 0

Related Questions