Reputation: 833
I have a table that looks like this
ID Type Change_Date
1 t1 2015-10-08
1 t2 2016-01-03
1 t3 2016-03-07
2 t1 2017-12-13
2 t2 2018-02-01
It shows if a customer has changed account type and when. However, I'd like a query that can give me the follow output
ID Type Change_Date
1 t1 2015-10
1 t1 2015-11
1 t1 2015-12
1 t2 2016-01
1 t2 2016-02
1 t3 2016-03
1 t3 2016-04
... ... ...
1 t3 2018-10
for each ID
. The output shows what account type the customer had for each month until the current month. My problem is filling in the "empty" months. In some cases the interval between account changes can be more than a year.
I hope this makes sense.
Thanks in advance.
Upvotes: 0
Views: 199
Reputation: 1477
Base on Presto SQL(because your origin question is about Presto/SQL)
Update in 2018-11-01: use lead()
to simplify SQL
Table mytable
same as yours
id type update_date
1 t1 2015-10-08
1 t2 2016-01-03
1 t3 2016-03-07
2 t1 2017-12-13
2 t2 2018-02-01
Table t_month
is a dictionary table which has all month data from 2015-01
to 2019-12
. This kind of dictionary tables are useful.
ym
2015-01
2015-02
2015-03
2015-04
2015-05
2015-06
2015-07
2015-08
2015-09
...
2019-12
mytable
Normally, your should 'manage' your data like their lifespan. So mytable
should like
id type start_date end_date
1 t1 2015-10-08 2016-01-03
1 t2 2016-01-03 2016-03-07
1 t3 2016-03-07 null
2 t1 2017-12-13 2018-02-01
2 t2 2018-02-01 null
But in this case, you don't. So next step is 'create' one. Use lead()
window function.
select
id,
type,
date_format(update_date, '%Y-%m') as start_month,
lead(
date_format(update_date, '%Y-%m'),
1, -- next one
date_format(current_date+interval '1' month, '%Y-%m') -- if null return next month
) over(partition by id order by update_date) as end_month
from mytable
Output
id type start_month end_month
1 t1 2015-10 2016-01
1 t2 2016-01 2016-03
1 t3 2016-03 2018-11
2 t1 2017-12 2018-02
2 t2 2018-02 2018-11
id
and month
It's simple
with id_month as (
select * from t_month
cross join (select distinct id from mytable)
)
select * from id_month
Output
ym id
2015-01 1
2015-02 1
2015-03 1
...
2019-12 1
2015-01 2
2015-02 2
2015-03 2
...
2019-12 2
Now, you can use subquery
in select
clause
select
id,
type,
ym
from (
select
t1.id,
t1.ym,
(select type from mytable2 where t1.id = id and t1.ym >= start_month and t1.ym < end_month) as type
from id_month t1
)
where type is not null
-- order by id, ym
with mytable2 as (
select
id,
type,
date_format(update_date, '%Y-%m') as start_month,
lead(
date_format(update_date, '%Y-%m'),
1, -- next one
date_format(current_date+interval '1' month, '%Y-%m') -- if null return next month
) over(partition by id order by update_date) as end_month
from mytable
)
, id_month as (
select * from t_month
cross join (select distinct id from mytable)
)
select
id,
type,
ym
from (
select
t1.id,
t1.ym,
(select type from mytable2 where t1.id = id and t1.ym >= start_month and t1.ym < end_month) as type
from id_month t1
)
where type is not null
--order by id, ym
Output
id type ym
1 t1 2015-10
1 t1 2015-11
1 t1 2015-12
1 t2 2016-01
1 t2 2016-02
1 t3 2016-03
1 t3 2016-04
...
1 t3 2018-10
2 t1 2017-12
2 t1 2018-01
2 t2 2018-02
...
2 t2 2018-10
Upvotes: 2