Reputation: 63
I have a SQL (Teradata) problem that I haven't been able to work through. I know the answer is probably much simpler than I'm making it seem.
I have a group of code like this:
ID timestamp location event_type
1111 20160601-0112 Detroit Event A
1111 20160602-0954 Brooklyn Event B
1111 20160602-1123 Brooklyn Event A
1112 20160912-1420 Minneapolis Event B
1113 20161123-1742 New Orleans Event A
1113 20161124-1841 New Orleans Event A
1113 20161124-2100 New Orleans Event B
1114 20170201-0959 Detroit Event A
1114 20170201-2350 Detroit Event A
Here are the conditions for what I need to return:
I want to return the FIRST event B per ID, and the nearest Event A that happened BEFORE that event B (based on timestamp). Therefore, for the dataset above, I would get:
ID timestamp location event_type
1111 20160601-0112 Detroit Event A
1111 20160602-0954 Brooklyn Event B
1113 20161124-1841 New Orleans Event A
1113 20161124-2100 New Orleans Event B
The third record for 1111 doesn't get returned because it happened after the Event B. ID 1112 doesn't get returned because it doesn't have an Event A preceding it. The first record of 1113 doesn't get returned because there is closer event A (to the event B) after it. 1114 doesn't get returned because there is no Event B.
I have been working on this for hours to the point where I'm no longer approaching it clearly...any help would be greatly appreciated!
Upvotes: 0
Views: 1545
Reputation: 50019
Given your sample data, I think the following should do the trick.
SELECT *
FROM testtable
QUALIFY
(
event_type = 'Event A'
AND
min(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) = 'Event B'
) OR
(
event_type = 'Event B'
AND
max(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) = 'Event A'
)
Here we are using Window functions to test records before and after the record in the result set. We do this in the QUALIFY clause which acts like a WHERE clause but for Window Functions.
Breaking down this qualify statement:
event_type = 'Event A'
AND
min(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) = 'Event B'
Is saying "If this current record is "Event A" AND the very next record when ordered by timestamp for this ID is "Event B" then allow the record".
event_type = 'Event B'
AND
max(event_type) OVER (PARTITION BY id ORDER BY "timestamp" ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) = 'Event A'
Is saying "If this current record is "Event B" AND the previous record when ordered by timestamp for this ID is "Event A" then allow the record.
It may be necessary to get more creating in that QUALIFY clause to catch edge cases, but once you wrap your head around how it all works you can get pretty creative in there.
Example:
CREATE MULTISET VOLATILE TABLE testtable
(
id int,
ts varchar(20),
location varchar(20),
event_type varchar(20)
) PRIMARY INDEX (id) ON COMMIT PRESERVE ROWS;
INSERT INTO testtable VALUES (1111,'20160601-0112','Detroit','Event A');
INSERT INTO testtable VALUES (1111,'20160602-0954','Brooklyn','Event B');
INSERT INTO testtable VALUES (1111,'20160602-1123','Brooklyn','Event A');
INSERT INTO testtable VALUES (1112,'20160912-1420','Minneapolis','Event B');
INSERT INTO testtable VALUES (1113,'20161123-1742','New Orleans','Event A');
INSERT INTO testtable VALUES (1113,'20161124-1841','New Orleans','Event A');
INSERT INTO testtable VALUES (1113,'20161124-2100','New Orleans','Event B');
INSERT INTO testtable VALUES (1114,'20170201-0959','Detroit','Event A');
INSERT INTO testtable VALUES (1114,'20170201-2350','Detroit','Event A');
SELECT *
FROM testtable
QUALIFY
(
event_type = 'Event A'
AND
min(event_type) OVER (PARTITION BY id ORDER BY ts ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) = 'Event B'
) OR
(
event_type = 'Event B'
AND
max(event_type) OVER (PARTITION BY id ORDER BY ts ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) = 'Event A'
);
+------+---------------+-------------+------------+
| id | ts | location | event_type |
+------+---------------+-------------+------------+
| 1111 | 20160601-0112 | Detroit | Event A |
| 1111 | 20160602-0954 | Brooklyn | Event B |
| 1113 | 20161124-1841 | New Orleans | Event A |
| 1113 | 20161124-2100 | New Orleans | Event B |
+------+---------------+-------------+------------+
Upvotes: 2
Reputation: 1
This worked with MS SQL 2012 - not sure if the syntax is the same?
;WITH myData AS
(
SELECT
ID,
min(timestamp) as timestamp,
location,
event_type
FROM
tableName
GROUP BY
ID,
location,
event_type
)
SELECT
*
FROM
myData
WHERE
(
myData.timestamp < (SELECT top 1 m2.timestamp FROM myData m2 WHERE m2.ID = myData.ID AND m2.event_type = 'Event B' ORDER BY m2.timestamp ASC)
OR
myData.event_type = 'Event B'
)
AND (SELECT Count(*) FROM myData m2 WHERE m2.ID = myData.ID AND m2.event_type = 'Event A') > 0
ORDER BY
myData.timestamp
Upvotes: 0