anon
anon

Reputation:

SQL Server 2014: Pairing rows from 2 tables based on values coming from a third one

I have 2 tables that contains typed events over time.

The first table #T1 contains events that always comes before events in the second table #T2.

A third table #E contains records that defines for an event the values that comes in #T1 and #T2 respectively.

Sample data:

CREATE TABLE #T1 
(
    EventTimestamp DateTime, 
    VehicleId int, 
    EventId varchar(50), 
    EventValue varchar(50)
);

CREATE TABLE #T2 
(
    EventTimestamp DateTime, 
    VehicleId int, 
    EventId varchar(50), 
    EventValue varchar(50)
);

CREATE TABLE #E 
(
     EventId varchar(50), 
     FirstValue int, 
     LastValue varchar(50)
);

INSERT INTO #T1(EventTimestamp, VehicleId , EventId, EventValue)
VALUES (GETDATE(), 1, 'TwigStatus', '12'),
       (GETDATE(), 2, 'SafeProtectEvent', '5')

INSERT INTO #T2(EventTimestamp, VehicleId , EventId, EventValue)
VALUES (DATEADD(second, 30, GETDATE()), 1, 'TwigStatus', '7'),
       (DATEADD(second, 30, GETDATE()), 2, 'SafeProtectEvent', '6')

INSERT INTO #E(EventId, FirstValue, LastValue)
VALUES ('TwigStatus', '12', '7'),
       ('SafeProtectEvent', '5', '6')

DECLARE @EventId varchar(50)  = 'TwigStatus';
DECLARE @FirstValue varchar(50) = '12';
DECLARE @LastValue varchar(50) = '7';

WITH ord AS 
(
    SELECT
        first, last,
        EventNr = ROW_NUMBER() OVER (ORDER BY first) 
    FROM
        (SELECT 
             first = t1.EventTimestamp, last = t2.EventTimestamp,
             rn = ROW_NUMBER() OVER (PARTITION BY t1.VehicleId ORDER BY t2.EventTimestamp) 
        FROM
            #T1 t1
        INNER JOIN 
            #T2 t2 ON t2.EventTimestamp > t1.EventTimestamp  
                   AND t2.EventValue = @LastValue
        WHERE 
            t1.EventId = @EventId AND t1.EventValue = @FirstValue) ids
    WHERE 
        rn = 1
)
SELECT
    t.VehicleId, o.first, o.last, t.EventId, t.EventValue 
FROM
    #T2 t
INNER JOIN 
    ord o ON t.EventTimestamp >= o.first 
          AND t.EventTimestamp <= o.last;
WHERE t.EventId = @EventId;    

DROP TABLE #E;
DROP TABLE #T1;
DROP TABLE #T2;

Basically, for a record in table E you see that for eventID 'TwigStatus' the value '12' should come first in table T1 and then '7' should be next in table T2. There is a second event sequence that is defined.

The VehicleId column is the link between the tables T1 and T2.

I need to compute the delay between two matching events in table T1 and T2.

To start simple, I do not use the table E yet, I'm using variables that contains predefined values and I'm returning timestamps.

But the result of the query above;

VehicleId   first                   last                    EventId             EventValue
1           2020-09-15 16:00:37.670 2020-09-15 16:01:07.670 TwigStatus          7
2           2020-09-15 16:00:37.670 2020-09-15 16:01:07.670 SafeProtectEvent    6

Is not what I'm expecting because the EventId 'SafeProtectEvent' Should be filtered out for now.

So I have 2 questions:

  1. How to avoid the second event to show with the actual query.
  2. How to work with the content of the table E and get rid of variables to process event sequences.

EDIT 1: Problem 1 Solved by adding a restriction on the query (see above)

Upvotes: 0

Views: 85

Answers (1)

seanb
seanb

Reputation: 6685

Update/new version below - now allows rows in T1 without matching rows in T2.

Based on discussion on comments below, I have updated this suggestion.

This code replaces everything from the DECLARE @EventId to the end of that SELECT statement.

Logic is as follows - for each row in T1 ...

  1. Determine the time boundaries for that row in T1 (between its EventTimestamp, and the next EventTimestamp in T1 for that vehicle; or 1 day in the future if there is no next event)
  2. Find the matching rows in T2, where 'matching' means a) same VehicleId, b) same EventId, c) EventValue is limited by possibilities in #E, and d) occurs within the time boundaries of T1
  3. Find the first of these rows, if available
  4. Calculate EventDelay as the times between the two timestamps
; WITH t1 AS
        (SELECT     VehicleId, 
                    EventTimestamp, 
                    EventId,
                    EventValue,
                    COALESCE(LEAD(EventTimestamp, 1) OVER (PARTITION BY VehicleID ORDER BY EventTimestamp), DATEADD(day, 1, getdate())) AS NextT1_EventTimeStamp
            FROM    #T1
        ),
   ord AS
        (SELECT     t1.VehicleId, 
                    t1.EventTimestamp AS first, 
                    t2.EventTimestamp AS last,
                    t1.EventId, 
                    t2.EventValue,
                    ROW_NUMBER() OVER (PARTITION BY t1.VehicleId, t1.EventTimestamp, t1.EventId ORDER BY t2.EventTimestamp) AS rn
            FROM    t1
                    LEFT OUTER JOIN #E  AS e ON t1.EventId = e.EventId 
                                       AND t1.EventValue = e.FirstValue
                    LEFT OUTER JOIN #T2 AS t2 ON t1.VehicleID = t2.VehicleID 
                                       AND t1.EventID = t2.EventID
                                       AND t2.eventId = e.EventId 
                                       AND t2.EventValue = e.LastValue
                                       AND t2.EventTimestamp > t1.EventTimestamp
                                       AND t2.EventTimestamp < NextT1_EventTimeStamp
        )
    SELECT      VehicleId, first, last, EventId, EventValue,
                DATEDIFF(second, first, last) AS EventDelay
        FROM    ord
        WHERE   rn = 1

The ever-growing DB<>fiddle has the latest updates, as well as original posts and previous suggestions.

Upvotes: 1

Related Questions