ruslan_krivoshein
ruslan_krivoshein

Reputation: 354

Get range of months between 2 dates

For instance we have rows like

in out
2020-01-01 2020-03-03

and I need result with a list of months between, i.e.:

in out between
2020-01-01 2020-03-03 ["2020-01-01", "2020-02-01", "2020-03-01"]

to count then all months by some grouping.

There can be any day, but with kept months.
I try to reach this via combination of explode/posexplode, months_between, add_months, CTE, but have a lack of brain yet. One of:

select
    *,
    case
        when cast(
            ceil(abs(months_between(test.in, test.out))) as int
        ) > 1 then split(
            repeat(
                concat(add_months(test.in, 1), ','),
                cast(
                    ceil(abs(months_between(test.in, test.out))) as int
                )
            ),
            ','
        )
        else array(test.in)
    end btw
from test;
in out btw
2020-01-01 2020-03-03 ["2020-02-01", "2020-02-01", "2020-02-01", ""]

I can't figure out how to increment months inside query with months_between. I believe there is some way, but I need a pointer.

Upvotes: 1

Views: 607

Answers (1)

leftjoin
leftjoin

Reputation: 38290

Use lateral view posexplode to get rows with position numbers, add position to the IN date using add_months, collect_list to get an array, concatenate array using concat_ws if necessary.

Demo:

WITH test AS (
    SELECT '2020-01-06' dt_in, '2020-03-03' dt_out
)
SELECT dt_in, dt_out, 
       COLLECT_LIST(DATE_FORMAT(ADD_MONTHS(t.dt_in, pos), 'yyyy-MM-01')) AS `between`
FROM test t  
LATERAL VIEW OUTER POSEXPLODE(
    SPLIT(
        SPACE(
            CAST(CEIL(ABS(MONTHS_BETWEEN(DATE_FORMAT(t.dt_in, 'yyyy-MM-01'), t.dt_out))) AS INT) - 1
            ),
        ' '
        )
) e AS pos, x  
GROUP BY dt_in, dt_out;

Result:

     dt_in       dt_out       between   

2020-01-01       2020-03-03   ["2020-01-01","2020-02-01","2020-03-01"]

Upvotes: 1

Related Questions