slim88
slim88

Reputation: 63

SQL/Teradata: How to return specific values and their preceding rows?

I have a SQL (Teradata) problem that I haven't been able to work through. I know the answer is probably much simpler than I'm making it seem.

I have a group of code like this:

ID      timestamp         location      event_type
1111    20160601-0112     Detroit       Event A
1111    20160602-0954     Brooklyn      Event B
1111    20160602-1123     Brooklyn      Event A
1112    20160912-1420     Minneapolis   Event B
1113    20161123-1742     New Orleans   Event A
1113    20161124-1841     New Orleans   Event A
1113    20161124-2100     New Orleans   Event B
1114    20170201-0959     Detroit       Event A
1114    20170201-2350     Detroit       Event A

Here are the conditions for what I need to return:

I want to return the FIRST event B per ID, and the nearest Event A that happened BEFORE that event B (based on timestamp). Therefore, for the dataset above, I would get:

ID      timestamp         location      event_type
1111    20160601-0112     Detroit       Event A
1111    20160602-0954     Brooklyn      Event B
1113    20161124-1841     New Orleans   Event A
1113    20161124-2100     New Orleans   Event B

The third record for 1111 doesn't get returned because it happened after the Event B. ID 1112 doesn't get returned because it doesn't have an Event A preceding it. The first record of 1113 doesn't get returned because there is closer event A (to the event B) after it. 1114 doesn't get returned because there is no Event B.

I have been working on this for hours to the point where I'm no longer approaching it clearly...any help would be greatly appreciated!

Upvotes: 0

Views: 1545

Answers (2)

JNevill
JNevill

Reputation: 50019

Given your sample data, I think the following should do the trick.

SELECT *
FROM testtable
QUALIFY

    (
        event_type = 'Event A'
        AND
        min(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) = 'Event B'
    ) OR
    (
        event_type = 'Event B'
        AND
        max(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) = 'Event A'
    )

Here we are using Window functions to test records before and after the record in the result set. We do this in the QUALIFY clause which acts like a WHERE clause but for Window Functions.

Breaking down this qualify statement:

        event_type = 'Event A'
        AND
        min(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) = 'Event B'

Is saying "If this current record is "Event A" AND the very next record when ordered by timestamp for this ID is "Event B" then allow the record".

        event_type = 'Event B'
        AND
        max(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) = 'Event A'

Is saying "If this current record is "Event B" AND the previous record when ordered by timestamp for this ID is "Event A" then allow the record.

It may be necessary to get more creating in that QUALIFY clause to catch edge cases, but once you wrap your head around how it all works you can get pretty creative in there.


Example:

CREATE MULTISET VOLATILE TABLE testtable
(
 id int,
 ts varchar(20),
 location varchar(20),
 event_type varchar(20)

) PRIMARY INDEX (id) ON COMMIT PRESERVE ROWS;

INSERT INTO testtable VALUES (1111,'20160601-0112','Detroit','Event A');
INSERT INTO testtable VALUES (1111,'20160602-0954','Brooklyn','Event B');
INSERT INTO testtable VALUES (1111,'20160602-1123','Brooklyn','Event A');
INSERT INTO testtable VALUES (1112,'20160912-1420','Minneapolis','Event B');
INSERT INTO testtable VALUES (1113,'20161123-1742','New Orleans','Event A');
INSERT INTO testtable VALUES (1113,'20161124-1841','New Orleans','Event A');
INSERT INTO testtable VALUES (1113,'20161124-2100','New Orleans','Event B');
INSERT INTO testtable VALUES (1114,'20170201-0959','Detroit','Event A');
INSERT INTO testtable VALUES (1114,'20170201-2350','Detroit','Event A');


SELECT *
FROM testtable
QUALIFY

    (
        event_type = 'Event A'
        AND
        min(event_type) OVER (PARTITION BY id ORDER BY ts ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) = 'Event B'
    ) OR
    (
        event_type = 'Event B'
        AND
        max(event_type) OVER (PARTITION BY id ORDER BY ts ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) = 'Event A'
    );

+------+---------------+-------------+------------+
|  id  |      ts       |  location   | event_type |
+------+---------------+-------------+------------+
| 1111 | 20160601-0112 | Detroit     | Event A    |
| 1111 | 20160602-0954 | Brooklyn    | Event B    |
| 1113 | 20161124-1841 | New Orleans | Event A    |
| 1113 | 20161124-2100 | New Orleans | Event B    |
+------+---------------+-------------+------------+

Upvotes: 2

West
West

Reputation: 1

This worked with MS SQL 2012 - not sure if the syntax is the same?

;WITH myData AS
    (
        SELECT
            ID,
            min(timestamp) as timestamp,
            location,
            event_type
        FROM
            tableName
        GROUP BY
            ID,
            location,
            event_type
    )
        SELECT
            *
        FROM
            myData
        WHERE
            (
                myData.timestamp < (SELECT top 1 m2.timestamp FROM myData m2 WHERE m2.ID = myData.ID AND m2.event_type = 'Event B' ORDER BY m2.timestamp ASC)
                OR
                myData.event_type = 'Event B'
            )
            AND (SELECT Count(*) FROM myData m2 WHERE m2.ID = myData.ID AND m2.event_type = 'Event A') > 0
        ORDER BY
            myData.timestamp

Upvotes: 0

Related Questions