Reputation: 407
In my data, I'm looking at quarterly counts per user. For all of the quarters that are missing, I'd like to impute them with 0. Here is an example:
Here is my current dataset:
ID Qtr Count
1 2018Q1 1
1 2018Q3 1
1 2018Q4 2
2 2018Q1 4
2 2018Q2 6
2 2019Q3 2
2 2019Q4 1
3 2018Q1 8
3 2018Q2 5
3 2018Q3 2
3 2018Q4 4
3 2019Q1 5
Updated table:
ID Qtr Count
1 2018Q1 1
1 2018Q2 0
1 2018Q3 1
1 2018Q4 2
1 2019Q1 0
1 2019Q2 0
1 2019Q3 0
1 2019Q4 0
2 2018Q1 4
2 2018Q2 6
2 2018Q3 0
2 2018Q4 0
2 2019Q1 0
2 2019Q2 0
2 2019Q3 2
2 2019Q4 1
3 2018Q1 8
3 2018Q2 5
3 2018Q3 2
3 2018Q4 4
3 2019Q1 5
3 2019Q2 0
3 2019Q3 0
3 2019Q4 0
Any help would be much appreciated.
Upvotes: 2
Views: 1273
Reputation: 280
You can create table containing all your quarters (in example quarters_
) and then join data (in example some_data) to it.
create table some_data (year number, quarter number, data varchar2(10));
insert into some_data values(2018,3,'AAAA');
create table quarters_ (signature varchar2(20), year number, quarter_number number);
insert into quarters_ (signature, year, quarter_number) values ('2018Q1', 2018, 1);
insert into quarters_ (signature, year, quarter_number) values ('2018Q2', 2018, 2);
insert into quarters_ (signature, year, quarter_number) values ('2018Q3', 2018, 3);
insert into quarters_ (signature, year, quarter_number) values ('2018Q4', 2018, 4);
insert into quarters_ (signature, year, quarter_number) values ('2019Q1', 2019, 1);
insert into quarters_ (signature, year, quarter_number) values ('2019Q2', 2019, 2);
insert into quarters_ (signature, year, quarter_number) values ('2019Q3', 2019, 3);
insert into quarters_ (signature, year, quarter_number) values ('2019Q4', 2019, 4);
select signature as Qtr, count (s.data) as count
from quarters_ q
left join some_data s on s.quarter = q.quarter_number and s.year = q.year
group by q.signature
Upvotes: 0
Reputation: 661
So, to solve this we will first all the year+quarter values for a particular ID and then we can calculate the count for each year+quarter. So the solution is divided into two steps as follows:
To create dataset the steps will be:
Get unique ID,year combination:
SELECT ID, ARRAY[substr(Qtr, 1, 4)] AS year from table_name
GROUP BY ID, ARRAY[substr(Qtr, 1, 4)]
Convert it year data to year+quarter data:
WITH dataset AS (
SELECT ID, ARRAY[substr(Qtr, 1, 4)] AS year from table_name
GROUP BY ID, ARRAY[substr(Qtr, 1, 4)]
)
select ID, transform(year, x->x||'Q1') || transform(year, x->x||'Q2') || transform(year, x->x||'Q3') ||
transform(year, x->x||'Q4')
AS year from dataset
Seperate out it array entry:
WITH dataset AS (
WITH inner_dataset AS (
SELECT ID, ARRAY[substr(Qtr, 1, 4)] AS year from table_name
GROUP BY ID, ARRAY[substr(Qtr, 1, 4)]
)
select ID, transform(year, x->x||'Q1') || transform(year, x->x||'Q2') ||
transform(year, x->x||'Q3') || transform(year, x->x||'Q4')
AS year from inner_dataset
)
SELECT ID,yr from dataset
CROSS JOIN UNNEST(year) AS t(yr)
Output of this query will be the ROWS of ID,year+quarter having all the quarters for each id. Now the next step will be to convert it into final dataset and then JOIN this data with original table.
SELECT table_name.ID,final_dataset.yr,coalesce(table_name.Count,0)
FROM table_name
CROSS JOIN final_dataset
ON table_name.ID = final_dataset.ID AND table_name.Qtr = final_dataset.yr
ORDER BY table_name.ID,final_dataset.yr
coalesce() is used to convert NULL values to 0.
WITH final_dataset AS (
WITH dataset AS (
WITH inner_dataset AS (
SELECT ID, ARRAY[substr(Qtr, 1, 4)] AS year from table_name
GROUP BY ID, ARRAY[substr(Qtr, 1, 4)]
)
select ID, transform(year, x->x||'Q1') || transform(year, x->x||'Q2') ||
transform(year, x->x||'Q3') || transform(year, x->x||'Q4')
AS year from inner_dataset
)
SELECT ID,yr from dataset
CROSS JOIN UNNEST(year) AS t(yr)
)
SELECT table_name.ID,final_dataset.yr,coalesce(table_name.Count,0)
FROM table_name
CROSS JOIN final_dataset
ON table_name.ID = final_dataset.ID AND table_name.Qtr = final_dataset.yr
ORDER BY table_name.ID,final_dataset.yr
Upvotes: 1
Reputation: 20770
You need to produce a table with zeroes.
For this sequence()
and UNNEST
come particularily useful.
Here is an example where your data has only quarter and count (no id
).
presto:default> SELECT coalesce(t.q, a.q), coalesce(t.c, 0)
-> FROM (VALUES ('2018Q1', 1)) t(q, c)
-> RIGHT OUTER JOIN (
-> SELECT CAST(y AS varchar) || 'Q' || CAST(x AS varchar) AS q
-> FROM UNNEST(sequence(2018, 2019)) _(y)
-> CROSS JOIN UNNEST(sequence(1, 4)) _(x)
-> ) a ON t.q = a.q
-> ;
_col0 | _col1
--------+-------
2018Q1 | 1
2018Q3 | 0
2019Q1 | 0
2018Q4 | 0
2019Q3 | 0
2019Q2 | 0
2018Q2 | 0
2019Q4 | 0
(8 rows)
(Tested on Presto 322)
You can generalize it easily for id
:
presto:default> WITH source_table AS (
-> SELECT *
-> FROM (VALUES (1, '2018Q1', 13), (2, '2018Q2', 42)) t(id, q, c)
-> )
-> SELECT id, q, coalesce(c, 0)
-> FROM source_table
-> RIGHT OUTER JOIN (
-> SELECT id, CAST(y AS varchar) || 'Q' || CAST(x AS varchar) AS q
-> FROM (SELECT DISTINCT id FROM source_table)
-> CROSS JOIN UNNEST(sequence(2018, 2019)) _(y)
-> CROSS JOIN UNNEST(sequence(1, 4)) _(x)
-> ) a USING(id, q)
-> ;
id | q | _col2
----+--------+-------
1 | 2018Q1 | 13
2 | 2018Q2 | 42
2 | 2019Q3 | 0
1 | 2018Q2 | 0
1 | 2019Q3 | 0
2 | 2018Q4 | 0
1 | 2019Q2 | 0
1 | 2019Q4 | 0
2 | 2019Q4 | 0
2 | 2018Q3 | 0
2 | 2019Q2 | 0
1 | 2018Q3 | 0
1 | 2019Q1 | 0
2 | 2019Q1 | 0
2 | 2018Q1 | 0
1 | 2018Q4 | 0
(16 rows)
Upvotes: 0