Reputation: 1
I have tag data that comes in individual rows with 3 types of events: Created, Modified and Deleted.
Tag data has the action date as well as the tag_name and if there is a tag_value. Modified events are when the tag is simply changing values and not deleted. The other 2 events show creation and deleted dates.
I want each "tag" to be one row with a CreatedDT, ModifiedDT and DeletedDT. From there I can have a isActive flag based on if there is a DeletedDT. I want to accomplish this without using massive windows/partitions as this dataset can get large (100 million+). Tags CAN be created and deleted multiple times, and this is where some of the logic gets complicated since you need to match the created and deleteddt with each other for that specific version of the tag.
I was experimenting with Snowflake Cursors to iterate through the events in sequential order. This option tends to be quite slow but it does work. Takes about a minute to get through 150-200 tags which is way too slow. I am more looking for an interesting way to do this, because there has to be a fun way to do this without window functions partitioned by each player and tag.
Example transactional data:
PlayerID | PROPERTY | TAGNAME | EVENTDATE | TAGACTION | LOGCODE | NEWVALUE | OLDVALUE |
---|---|---|---|---|---|---|---|
1 | 7 | testtag1 | 10/25/24 14:10 | added | 68529384303 | - | null |
2 | 4 | Testtag2 | 10/25/24 14:31 | changed | 51717448005 | 0.9 | 0.93 |
3 | 7 | testtag3 | 10/25/24 14:04 | added | 68530352103 | casino | null |
4 | 3 | testtag4 | 10/25/24 14:02 | added | 156526846142 | Yes | null |
5 | 4 | testtag5 | 10/25/24 14:08 | removed | 51714702840 | null | null |
Example summary data:
PLAYERID | TAGNAME | TAGVALUE | CREATEDDT | UPDATEDDT | DELETEDDT | ISACTIVE |
---|---|---|---|---|---|---|
1 | Testtag1 | test1 | 8/7/24 12:00 | null | null | TRUE |
2 | Testtag2 | test1 | 7/22/24 11:52 | null | null | TRUE |
3 | Testtag3 | test1 | 7/22/24 11:52 | null | null | TRUE |
4 | Testtag4 | test1 | 8/25/24 0:03 | null | 9/1/24 0:02 | FALSE |
5 | Testtag5 | test1 | 9/24/24 0:06 | null | 10/1/24 0:02 | FALSE |
6 | Testtag6 | test1 | 10/25/24 0:06 | null | 9/1/24 0:02 | FALSE |
Example destination table:
CREATE OR REPLACE TABLE Tag_Tracking (
TagID INT AUTOINCREMENT PRIMARY KEY,
PLAYERID STRING,
TAGNAME STRING,
TagValue STRING,
CreatedDT TIMESTAMP_NTZ,
UpdatedDT TIMESTAMP_NTZ,
DeletedDT TIMESTAMP_NTZ,
IsActive BOOLEAN DEFAULT TRUE
);
Cursor code I have now:
DECLARE
v_cursor CURSOR FOR
SELECT
PLAYERID,
TAGNAME,
TAGACTION,
NewValue,
EVENTDATE
from raw_tag_changes
where eventdate BETWEEN '2024-10-24 18:00:00' AND '2024-10-24 19:00:00'
ORDER BY PLAYERID, TAGNAME, EVENTDATE ASC;
BEGIN
FOR record IN v_cursor DO
-- Assign cursor values to helper variables with explicit types
LET v_playerid STRING := record.PLAYERID;
LET v_tagname STRING := record.TAGNAME;
LET v_tagaction STRING := record.TAGACTION;
LET v_tagvalue STRING := record.NEWVALUE;
LET v_eventdate TIMESTAMP := record.EVENTDATE;
CASE
WHEN :v_tagaction = 'added' THEN
-- Use MERGE to insert a new row only if no active instance exists
MERGE INTO Tag_Tracking AS tt
USING (
SELECT :v_playerid AS PLAYERID, :v_tagname AS TAGNAME, :v_eventdate AS EVENTDATE
) AS src
ON tt.PLAYERID = src.PLAYERID
AND tt.TAGNAME = src.TAGNAME
AND tt.CreatedDT = src.EVENTDATE
WHEN NOT MATCHED THEN
INSERT (PLAYERID, TAGNAME, TagValue, CreatedDT, IsActive)
VALUES (:v_playerid, :v_tagname, :v_tagvalue, :v_eventdate, TRUE);
WHEN :v_tagaction = 'changed' THEN
-- Update the TagValue of the active instance if "changed"
UPDATE Tag_Tracking
SET TagValue = :v_tagvalue
WHERE PLAYERID = :v_playerid
AND TAGNAME = :v_tagname
AND IsActive = TRUE;
WHEN :v_tagaction = 'removed' THEN
-- Set DeletedDT and deactivate the tag instance if "removed"
UPDATE Tag_Tracking
SET DeletedDT = :v_eventdate, IsActive = FALSE
WHERE PLAYERID = :v_playerid
AND TAGNAME = :v_tagname
AND IsActive = TRUE;
END CASE;
END FOR;
Upvotes: 0
Views: 79