McRip
McRip

Reputation: 100

Aggregate window function and outer join

I am trying to solve the following question in a performance-oriented way. My current implementation involves ugly loops and is painfully slow.

Specifically, I have a table (transactions) of timestamped orders per customer for various items:

timestamp customer item volume
2000 Joe A 100
2001 Joe A 200
2001 Doe A 100

Besides, I have a second table (valuations) showing prices for the items:

timestamp item price
2000 A 1.1
2001 A 1.2
2002 A 1.3

Now, I would like to track the value (price*stock) of each customer's stock (cumulative volume) sampled according to the timestamp in the valuations table:

timestamp customer item stock value
2000 Joe A 100 110
2001 Joe A 300 360
2002 Joe A 300 390
2001 Doe A 100 120
2002 Doe A 100 130

Essentially, this is going to be some form of (right) joining transactions and valuations. However, the catch here is that I would have to do one right join per (customer, item) combination. In other words, for every (customer, item) I would have to join the full set of timestamps.

My current (potentially pretty inefficient) solution loops across customers. For every customer, it creates the cumulative volume, right joins valuations and forward-fills (using the last function) columns coming from the transactions table:

CREATE OR REPLACE FUNCTION public.last_func(anyelement, anyelement)
 RETURNS anyelement
 LANGUAGE sql
 IMMUTABLE STRICT
AS $function$
select $2;
$function$
;

   create or replace function last_func(anyelement, anyelement)
returns anyelement language sql immutable strict
as $$
    select $2;
$$;
    
select 
    valuations.timestamp,
    last(t.customer) over (partition by valuations.item order by valuations.timestamp) as customer,
    valuations.item,
    last(t.stock) over (partition by valuations.item order by valuations.timestamp) as stock,
    last(t.stock) over (partition by valuations.item order by valuations.timestamp) * valuations.price as value
from (select 
    timestamp,
    customer,
    item,
    volume as order_volume,
    sum(volume) over (partition by item order by item, timestamp) as stock
from 
    transactions
where customer = 'Joe') t
right join 
    valuations on t.timestamp = valuations.timestamp and t.item = valuations.item

This seems rather inefficient and becomes very slow for a large number of customers. Does anyone have an idea how to do that in one go? Would be great if you could help me out here.

Thanks in advance and best regards

Upvotes: 2

Views: 106

Answers (2)

shawnt00
shawnt00

Reputation: 17915

Looks like a good case for lateral join. This doesn't assume that the timestamps will be identical. I'm guessing that the general case will potentially have none or even multiple transactions between valuations. (I'm not even sure that you need an outer join.)

select v.*, stock * price
from valuations v left join lateral (
    select distinct on (customer) customer,
        sum(volume) over (partition by customer, item order by timestamp) as stock
    from transactions t
    where t.item = v.item and t.timestamp <= v.timestamp
    order by customer, timestamp desc
) t on true
order by customer, timestamp

https://dbfiddle.uk/?rdbms=postgres_10&fiddle=af82f52655dfc55029e430b7933cd899

Upvotes: 0

LukStorms
LukStorms

Reputation: 29647

Just a suggestion, since I can't test this on large amounts of data.

But what if you use a temporary table that contains all expected combinations of the customers and validations.

Then left join to customers to calculate the rolling sum.

For example:

create temporary table tmp_customer_valuations (
 timestamp int not null, 
 item varchar(30) not null, 
 customer varchar(30) not null, 
 price decimal(10,1) not null
);
insert into tmp_customer_valuations
(timestamp, item, price, customer)
select v.timestamp, v.item, v.price, c.customer
from valuations v
join (
  select item, customer, min(timestamp) as min_timestamp
  from transactions
  group by item, customer 
) c
  on c.item = v.item
 and c.min_timestamp <= v.timestamp
create index idx_tmp_customer_valuations
on tmp_customer_valuations (timestamp, item)
select 
  tmp.timestamp
, tmp.customer
, tmp.item
--, tr.volume as order_volume,
, sum(coalesce(tr.volume, 0)) 
     over (partition by tmp.item, tmp.customer 
           order by tmp.timestamp) as stock
, tmp.price * sum(coalesce(tr.volume, 0)) 
     over (partition by tmp.item, tmp.customer 
           order by tmp.timestamp) as value
from tmp_customer_valuations tmp
left join transactions tr
  on tr.timestamp = tmp.timestamp 
 and tr.item = tmp.item
 and tr.customer = tmp.customer
order by
 tmp.customer desc,
 tmp.item,
 tmp.timestamp;
timestamp customer item stock value
2000 Joe A 100 110.0
2001 Joe A 300 360.0
2002 Joe A 300 390.0
2001 Doe A 100 120.0
2002 Doe A 100 130.0

db<>fiddle here

(Btw, also verify if the tables could use an extra index)

Upvotes: 1

Related Questions