j Rodr
j Rodr

Reputation: 105

Postgres - calculate change in cumulative data

I am gathering data from a few API sources via Python, and adding this to 2 tables in Postgres.

I then use this data to make reports, joining and grouping/filtering the data. Every day I add thousands of rows.

The cost, revenue and sales is always cumulative, meaning that each data point is from t1 for that product and t2 is the time of data retrival.

The latest data pull will therefore include all the previous data down to t1. t1, t2 are timestamp without time zone in Postgres. I currently use Postgres 10.

sample:

id, vendor_id, product_id, t1, t2, cost, revenue, sales
1, a, a, 2018-01-01, 2018-04-18, 50, 200, 34
2, a, b, 2018-05-01, 2018-04-18, 10, 100, 10
3, a, c, 2018-01-02, 2018-04-18, 12, 100, 9
4, a, d, 2018-01-03, 2018-04-18, 12, 100, 8
5, b, e, 2018-25-02, 2018-04-18, 12, 100, 7

6, a, a, 2018-01-01, 2018-04-17, 40, 200, 30
7, a, b, 2018-05-01, 2018-04-17, 0, 95, 8
8, a, c, 2018-01-02, 2018-04-17, 10, 12, 5
9, a, d, 2018-01-03, 2018-04-17, 8, 90, 4
10, b, e, 2018-25-02, 2018-04-17, 9, 0-, 3

Cost and revenue are from two tables, and I join them on vendor_id, product_id and t2.

Is there a way I can go through all of the data and "shift" it and subtract, so instead of having cumulative data, I will have time series based data?

Should this be done prior to storing it, or better done when making the reports?

For reference, currently if I want a report with change between two times, I do two sub queries, but it seem backwards vs having the data in time series and just aggregate the needed intervals.

with report1 as (select ...),
report2 as (select ...)
select .. from report1 left outer join report2 on ...

Thanks a lot in advance!

JR

Upvotes: 4

Views: 274

Answers (1)

Georgi Raychev
Georgi Raychev

Reputation: 1334

You can use LAG():

Window Functions:

...returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default (which must be of the same type as value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null.

with sample_data as (
        select 1 as id, 'a'::text vendor_id, 'a'::text product_id, '2018-01-01'::date as t1, '2018-04-18'::date as t2, 50 as cost, 200 as revenue, 36 as sales
        union all
        select 2 as id, 'a'::text vendor_id, 'b'::text product_id, '2018-01-01'::date as t1, '2018-04-18'::date as t2, 55 as cost, 200 as revenue, 34 as sales
        union all
        select 3 as id, 'a'::text vendor_id, 'a'::text product_id, '2018-01-01'::date as t1, '2018-04-17'::date as t2, 35 as cost, 150 as revenue, 25 as sales
        union all
        select 4 as id, 'a'::text vendor_id, 'b'::text product_id, '2018-01-01'::date as t1, '2018-04-17'::date as t2, 25 as cost, 140 as revenue, 23 as sales
        union all
        select 5 as id, 'a'::text vendor_id, 'a'::text product_id, '2018-01-01'::date as t1, '2018-04-16'::date as t2, 16 as cost, 70 as revenue, 12 as sales
        union all
        select 6 as id, 'a'::text vendor_id, 'b'::text product_id, '2018-01-01'::date as t1, '2018-04-16'::date as t2, 13 as cost, 65 as revenue, 11 as sales
)
select sd.*
    , coalesce(cost - lag(cost) over (partition by vendor_id, product_id order by t2),cost) cost_new
    , coalesce(revenue - lag(revenue) over (partition by vendor_id, product_id order by t2),revenue) revenue_new
    , coalesce(sales - lag(sales) over (partition by vendor_id, product_id order by t2),sales) sales_new
from sample_data sd
order by t2 desc

Upvotes: 1

Related Questions