Reputation: 94
I have data this way
Column1 Column2 Date
d1 b1 1/1/2020
d1 b2 1/1/2020
d1 b3 1/1/2020
d2 b2 1/2/2020
d1 b3 1/2/2020
d1 b4 1/2/2020
d1 b1 1/3/2020
d2 b3 1/3/2020
d2 b3 1/4/2020
The grain of the above data is Col1, Col2 For all this grain, if the data at grain level does not come on next day, I need to repeat from previous day. Meaning if I run the query toady and today is 1/4/2020, the expected output is as follows
Column1 Column2 Date Status
d1 b1 1/1/2020 FromInput
d1 b1 1/2/2020 Repeated
d1 b1 1/3/2020 FromInput
d1 b1 1/4/2020 FromInput
d1 b2 1/1/2020 FromInput
d1 b2 1/2/2020 Repeated
d1 b2 1/3/2020 Repeated
d1 b2 1/4/2020 Repeated
d1 b3 1/1/2020 FromInput
d1 b3 1/2/2020 FromInput
d1 b3 1/3/2020 Repeated
d1 b3 1/4/2020 Repeated
d1 b4 1/2/2020 FromInput
d1 b4 1/3/2020 Repeated
d1 b4 1/4/2020 Repeated
d2 b2 1/2/2020 FromInput
d2 b2 1/3/2020 Repeated
d2 b2 1/4/2020 Repeated
d2 b3 1/3/2020 FromInput
d2 b3 1/4/2020 FromInput
Upvotes: 0
Views: 141
Reputation: 222702
You can simply cross join
the distinct values of col1
, col2
and date
to generate all possible combinations, and then bring the original table with a left join
. The rest is just a little conditional logic in the select
clause to compute the status
:
select
c1.col1,
c2.col2,
d.date,
case when t.col1 is null then 'Repeated' else 'From Input' end status
from (select distinct col1 from mytable) c1
cross join (select distinct col2 from mytable) c2
cross join (select distinct date from mytable) d
left join mytable t
on t.col1 = c1.col1
and t.col2 = c2.col2
and t.date = c3.date
Upvotes: 0
Reputation: 173190
Below is for BigQuery Standard SQL
#standardSQL
WITH days AS (
SELECT column1, column2, day FROM (
SELECT column1, column2,
MIN(PARSE_DATE('%m/%d/%Y', day)) min_day,
PARSE_DATE('%m/%d/%Y', '1/4/2020') max_day -- or just CURRENT_DATE() in real use case
FROM `project.dataset.table`
GROUP BY column1, column2
),
UNNEST(GENERATE_DATE_ARRAY(min_day, max_day)) day
)
SELECT d.column1, d.column2, d.day,
IFNULL(t.value,
LAST_VALUE(value IGNORE NULLS)
OVER(PARTITION BY d.column1, d.column2 ORDER BY d.day ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
) value,
IF(t.value IS NULL, 'Repeated', 'From Input') Status
FROM days d
LEFT JOIN `project.dataset.table` t
ON d.column1 = t.column1
AND d.column2 = t.column2
AND d.day = PARSE_DATE('%m/%d/%Y', t.day)
-- ORDER BY column1, column2, day
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'd1' column1, 'b1' column2, '1/1/2020' day, 1 value UNION ALL
SELECT 'd1', 'b2', '1/1/2020', 3 UNION ALL
SELECT 'd1', 'b3', '1/1/2020', 4 UNION ALL
SELECT 'd2', 'b2', '1/2/2020', 7 UNION ALL
SELECT 'd1', 'b3', '1/2/2020', 5 UNION ALL
SELECT 'd1', 'b4', '1/2/2020', 6 UNION ALL
SELECT 'd1', 'b1', '1/3/2020', 2 UNION ALL
SELECT 'd2', 'b3', '1/3/2020', 8 UNION ALL
SELECT 'd2', 'b3', '1/4/2020', 9
), days AS (
SELECT column1, column2, day FROM (
SELECT column1, column2,
MIN(PARSE_DATE('%m/%d/%Y', day)) min_day,
PARSE_DATE('%m/%d/%Y', '1/4/2020') max_day -- or just CURRENT_DATE() in real use case
FROM `project.dataset.table`
GROUP BY column1, column2
),
UNNEST(GENERATE_DATE_ARRAY(min_day, max_day)) day
)
SELECT d.column1, d.column2, d.day,
IFNULL(t.value,
LAST_VALUE(value IGNORE NULLS)
OVER(PARTITION BY d.column1, d.column2 ORDER BY d.day ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
) value,
IF(t.value IS NULL, 'Repeated', 'From Input') Status
FROM days d
LEFT JOIN `project.dataset.table` t
ON d.column1 = t.column1
AND d.column2 = t.column2
AND d.day = PARSE_DATE('%m/%d/%Y', t.day)
-- ORDER BY column1, column2, day
with output
Row column1 column2 day value Status
1 d1 b1 2020-01-01 1 From Input
2 d1 b1 2020-01-02 1 Repeated
3 d1 b1 2020-01-03 2 From Input
4 d1 b1 2020-01-04 2 Repeated
5 d1 b2 2020-01-01 3 From Input
6 d1 b2 2020-01-02 3 Repeated
7 d1 b2 2020-01-03 3 Repeated
8 d1 b2 2020-01-04 3 Repeated
9 d1 b3 2020-01-01 4 From Input
10 d1 b3 2020-01-02 5 From Input
11 d1 b3 2020-01-03 5 Repeated
12 d1 b3 2020-01-04 5 Repeated
13 d1 b4 2020-01-02 6 From Input
14 d1 b4 2020-01-03 6 Repeated
15 d1 b4 2020-01-04 6 Repeated
16 d2 b2 2020-01-02 7 From Input
17 d2 b2 2020-01-03 7 Repeated
18 d2 b2 2020-01-04 7 Repeated
19 d2 b3 2020-01-03 8 From Input
20 d2 b3 2020-01-04 9 From Input
Upvotes: 2