Reputation: 159
I have several time series (one for each unit in the example below, which focuses on a single unit) with factors that change with no pattern (factors change at different times for different units). The data are updated each week and, when the factor changes, the entire time series for that unit has to be updated. I am trying to find a tidy/efficient way to identify (for each unit) the weeks during which the factor changed or when the factor differs from any factor in the previous week. The example below "works," but doesn't seem like the "best" way to accomplish this.
I am using Starburst SQL.
Example data and "minimal" working example
DROP TABLE IF EXISTS factor_daily ;
CREATE TABLE factor_daily (
unit varchar(2),
factor int,
effdate date
);
INSERT INTO factor_daily VALUES ('A', 1, date'2024-01-02');
INSERT INTO factor_daily VALUES ('A', 1, date'2024-01-03');
INSERT INTO factor_daily VALUES ('A', 1, date'2024-01-04');
INSERT INTO factor_daily VALUES ('A', 1, date'2024-01-05');
INSERT INTO factor_daily VALUES ('A', 1, date'2024-01-08');
INSERT INTO factor_daily VALUES ('A', 1, date'2024-01-09');
INSERT INTO factor_daily VALUES ('A', 1, date'2024-01-10');
INSERT INTO factor_daily VALUES ('A', 5, date'2024-01-11');
INSERT INTO factor_daily VALUES ('A', 5, date'2024-01-12');
INSERT INTO factor_daily VALUES ('A', 5, date'2024-01-16');
INSERT INTO factor_daily VALUES ('A', 5, date'2024-01-17');
INSERT INTO factor_daily VALUES ('A', 5, date'2024-01-18');
INSERT INTO factor_daily VALUES ('A', 5, date'2024-01-19');
INSERT INTO factor_daily VALUES ('A', 5, date'2024-01-22');
INSERT INTO factor_daily VALUES ('A', 5, date'2024-01-23');
INSERT INTO factor_daily VALUES ('A', 5, date'2024-01-24');
INSERT INTO factor_daily VALUES ('A', 5, date'2024-01-25');
INSERT INTO factor_daily VALUES ('A', 5, date'2024-01-26');
INSERT INTO factor_daily VALUES ('A', 10, date'2024-01-29');
INSERT INTO factor_daily VALUES ('A', 10, date'2024-01-30');
INSERT INTO factor_daily VALUES ('A', 10, date'2024-01-31');
INSERT INTO factor_daily VALUES ('A', 10, date'2024-02-01');
INSERT INTO factor_daily VALUES ('A', 10, date'2024-02-02');
Given this data, I add a column to identify the week.
DROP TABLE IF EXISTS factor_wk ;
CREATE TABLE factor_wk AS
SELECT *, date_trunc('week', effdate) AS dt_wk
FROM factor_daily
;
I then create the following table to identify when the factor is not constant within the week or differs from the previous week.
DROP TABLE IF EXISTS delta_factor_wkly ;
CREATE TABLE delta_factor_wkly AS
SELECT *,
IF((factor_min_wk <> factor_max_wk) -- change during that week
OR (least(lag_factor_min_wk, lag_factor_max_wk) <> least(factor_min_wk, factor_max_wk)) -- change from previous week
, 1, 0) AS delta
FROM
(
SELECT *,
lag(factor_min_wk) OVER (PARTITION BY unit ORDER BY dt_wk) AS lag_factor_min_wk,
lag(factor_max_wk) OVER (PARTITION BY unit ORDER BY dt_wk) AS lag_factor_max_wk
FROM
(
SELECT unit, dt_wk,
min(factor) AS factor_min_wk,
max(factor) AS factor_max_wk
FROM
(
SELECT unit,
effdate,
factor,
date_trunc('week', effdate) AS dt_wk
FROM factor_daily
)
GROUP BY 1, 2
)
)
;
I then join the two tables.
SELECT l.*, r.delta
FROM factor_wk AS l
LEFT JOIN
delta_factor_wkly AS r
ON l.unit = r.unit
AND l.dt_wk = r.dt_wk
ORDER BY unit, effdate
;
With this last join, I can identify the units that need to be updated.
However, there must be a cleaner and/or more efficient way to do this.
Upvotes: 0
Views: 51
Reputation: 3293
You are running the same query twice. Replace the second occurrence of
SELECT unit, effdate, factor, date_trunc('week', effdate) AS dt_wk
FROM factor_daily
with factor_wk
unit | dt_wk | factor_min_wk | factor_max_wk | lag_factor_min_wk | lag_factor_max_wk | delta |
---|---|---|---|---|---|---|
A | 1 | 1 | 1 | null | null | 0 |
A | 2 | 1 | 5 | 1 | 1 | 1 |
A | 3 | 5 | 5 | 1 | 5 | 1 |
A | 4 | 5 | 5 | 5 | 5 | 0 |
A | 5 | 10 | 10 | 5 | 5 | 1 |
Upvotes: 0