ilearn
ilearn

Reputation: 193

How to generate series using start and end date and quarters on postgres

I have a table like shown below where I want to use the start and end date to evenly distribute the value for each row to the 3 months in each quarter to all of the quarters in between start and end date (last two columns).

I am familiar with generate series and intervals in Postgres but I am having hard time to get what I want.

My table has and ID column that groups rows together, a quarter column that indicates which quarter the row references for the ID, a value column that is the value for the whole quarter (and every quarter in the date range), and start_date and end_date columns indicating the date range. Here is a sample:

ID  quarter     value   start_date  end_date 
1   2            152    2019-11-07  2050-12-30
1   1            785    2019-11-07  2050-12-30
2   2            152    2019-03-05  2050-12-30
2   1            785    2019-03-05  2050-12-30
3   4            41     2018-06-12  2050-12-30
3   3            50     2018-06-12  2050-12-30
3   2            88     2018-06-12  2050-12-30
3   1            29     2018-06-12  2050-12-30
4   2           1607    2018-12-17  2050-12-30
4   1           4803    2018-12-17  2050-12-30

Here is my desired output (for ID 1):

ID  quarter     value   start_date  end_date 
1   2            152/3  2020-04-01  2020-07-01
1   1            785/3  2020-01-01  2020-04-01
1   2            152/3  2021-04-01  2021-07-01
1   1            785/3  2021-01-01  2021-04-01

start_date in the output will be the next quarter on first table. I need the series to be generated from the start_date to the end_date of the first table.

Upvotes: 1

Views: 1820

Answers (2)

etch_45
etch_45

Reputation: 792

This is one way to go about it. Showing an example based on the output you've outlined. You can then add more conditions to the CASE/WHEN for additional quarters.

SELECT
    ID,
    Quarter,
    Value/3 AS "Value",
    CASE
     WHEN Quarter = 1 THEN '2020-01-01'
     WHEN Quarter = 2 THEN '2020-04-01'
    END AS "Start_Date",
    CASE
     WHEN Quarter = 1 THEN '2020-04-01'
     WHEN Quarter = 2 THEN '2020-07-01'
    END AS "End_Date"
   FROM
    Table

Upvotes: 1

Henry
Henry

Reputation: 15732

You can do this by using the GENERATE_SERIES function and passing in the start and end date for each unique (by ID) row and setting the interval to 3 months. Then join the result back with your original table on both ID and quarter.

Here's an example (note original_data is what I've called your first table):

WITH
quarters_table AS (
    SELECT
        t.ID,
        (EXTRACT('month' FROM t.quarter_date) - 1)::INT / 3 + 1 AS quarter,
        t.quarter_date::DATE AS start_date,
        COALESCE(
            LEAD(t.quarter_date) OVER (),
            DATE_TRUNC('quarter', t.original_end_date) + INTERVAL '3 months'
        )::DATE AS end_date
    FROM (
        SELECT
            original_record.ID,
            original_record.end_date AS original_end_date,
            GENERATE_SERIES(
                DATE_TRUNC('quarter', original_record.start_date),
                DATE_TRUNC('quarter', original_record.end_date),
                INTERVAL '3 months'
            ) AS quarter_date
        FROM (
            SELECT DISTINCT ON (original_data.ID)
                original_data.ID,
                original_data.start_date,
                original_data.end_date
            FROM
                original_data
            ORDER BY
                original_data.ID
        ) AS original_record
    ) AS t
)

SELECT
    quarters_table.ID,
    quarters_table.quarter,
    original_data.value::DOUBLE PRECISION / 3 AS value,
    quarters_table.start_date,
    quarters_table.end_date
FROM
    quarters_table
INNER JOIN
    original_data
ON
    quarters_table.ID = original_data.ID
    AND quarters_table.quarter = original_data.quarter;

Sample output:

 id | quarter |      value       | start_date |  end_date  
----+---------+------------------+------------+------------
  1 |       1 | 261.666666666667 | 2020-01-01 | 2020-04-01
  1 |       2 | 50.6666666666667 | 2020-04-01 | 2020-07-01
  1 |       1 | 261.666666666667 | 2021-01-01 | 2021-04-01
  1 |       2 | 50.6666666666667 | 2021-04-01 | 2021-07-01

For completeness, here's the original_data table I've used in testing:

WITH
original_data AS (
    SELECT
        1 AS ID,
        2 AS quarter,
        152 AS value,
        '2019-11-07'::DATE AS start_date,
        '2050-12-30'::DATE AS end_date
    
    UNION ALL

    SELECT
        1 AS ID,
        1 AS quarter,
        785 AS value,
        '2019-11-07'::DATE AS start_date,
        '2050-12-30'::DATE AS end_date
    
    UNION ALL

    SELECT
        2 AS ID,
        2 AS quarter,
        152 AS value,
        '2019-03-05'::DATE AS start_date,
        '2050-12-30'::DATE AS end_date
    
    -- ...
)

Upvotes: 2

Related Questions