Artem
Artem

Reputation: 833

Reusing value for multiple dates in SQL

I have a table that looks like this

ID            Type               Change_Date               
1              t1                2015-10-08
1              t2                2016-01-03
1              t3                2016-03-07
2              t1                2017-12-13
2              t2                2018-02-01

It shows if a customer has changed account type and when. However, I'd like a query that can give me the follow output

ID            Type               Change_Date               
1              t1                2015-10
1              t1                2015-11
1              t1                2015-12
1              t2                2016-01
1              t2                2016-02
1              t3                2016-03
1              t3                2016-04
...            ...               ...
1              t3                2018-10

for each ID. The output shows what account type the customer had for each month until the current month. My problem is filling in the "empty" months. In some cases the interval between account changes can be more than a year.

I hope this makes sense.

Thanks in advance.

Upvotes: 0

Views: 199

Answers (1)

Archon
Archon

Reputation: 1477

Base on Presto SQL(because your origin question is about Presto/SQL)


Update in 2018-11-01: use lead() to simplify SQL


Prepare data

Table mytable same as yours

id  type  update_date
1   t1    2015-10-08
1   t2    2016-01-03
1   t3    2016-03-07
2   t1    2017-12-13
2   t2    2018-02-01

Table t_month is a dictionary table which has all month data from 2015-01 to 2019-12. This kind of dictionary tables are useful.

ym
2015-01
2015-02
2015-03
2015-04
2015-05
2015-06
2015-07
2015-08
2015-09
...
2019-12

Add lifespan for mytable

Normally, your should 'manage' your data like their lifespan. So mytable should like

id  type   start_date      end_date
1   t1     2015-10-08      2016-01-03
1   t2     2016-01-03      2016-03-07
1   t3     2016-03-07      null
2   t1     2017-12-13      2018-02-01
2   t2     2018-02-01      null

But in this case, you don't. So next step is 'create' one. Use lead() window function.

select 
    id,
    type, 
    date_format(update_date, '%Y-%m') as start_month,
    lead(
        date_format(update_date, '%Y-%m'), 
        1, -- next one
        date_format(current_date+interval '1' month, '%Y-%m') -- if null return next month
    ) over(partition by id order by update_date) as end_month
from mytable

Output

id  type  start_month  end_month
1   t1    2015-10     2016-01
1   t2    2016-01     2016-03
1   t3    2016-03     2018-11
2   t1    2017-12     2018-02
2   t2    2018-02     2018-11

Cross join id and month

It's simple

with id_month as (
    select * from t_month 
    cross join (select distinct id from mytable)
)
select * from id_month

Output

ym      id
2015-01 1
2015-02 1
2015-03 1
...
2019-12 1
2015-01 2
2015-02 2
2015-03 2
...
2019-12 2

Finally

Now, you can use subquery in select clause

select 
    id,
    type,
    ym
from (
    select
        t1.id,
        t1.ym,
        (select type from mytable2 where t1.id = id and t1.ym >= start_month and t1.ym < end_month) as type
    from id_month t1
)
where type is not null
-- order by id, ym

Full sql

with mytable2 as (
    select 
        id,
        type, 
        date_format(update_date, '%Y-%m') as start_month,
        lead(
            date_format(update_date, '%Y-%m'), 
            1, -- next one
            date_format(current_date+interval '1' month, '%Y-%m') -- if null return next month
        ) over(partition by id order by update_date) as end_month
    from mytable
)
, id_month as (
    select * from t_month 
    cross join (select distinct id from mytable)
)
select 
    id,
    type,
    ym
from (
    select
        t1.id,
        t1.ym,
        (select type from mytable2 where t1.id = id and t1.ym >= start_month and t1.ym < end_month) as type
    from id_month t1
)
where type is not null
--order by id, ym

Output

id  type  ym
1   t1    2015-10
1   t1    2015-11
1   t1    2015-12
1   t2    2016-01
1   t2    2016-02
1   t3    2016-03
1   t3    2016-04
...
1   t3    2018-10
2   t1    2017-12
2   t1    2018-01
2   t2    2018-02
...
2   t2    2018-10

Upvotes: 2

Related Questions