Ben P
Ben P

Reputation: 3379

Populate one column based on the previous values of another

I am trying to create a column which populates the transaction ID for every row up until the row where that transaction was completed - in this example every "add to basket" event before an order.

So far I have tried using FIRST_VALUE:

SELECT 
UserID, date, session_id, hitnumber, add_to_basket, transactionid, 
first_value(transactionid) over (partition by trans_part order by date, transactionid) AS t_id
FROM(
  select UserID, date, session_id, hitnumber, add_to_basket, transactionid, 
  SUM(CASE WHEN transactionid IS NULL THEN 0 ELSE 1 END) OVER (ORDER BY date, transactionid) AS trans_part,
  FIRST_VALUE(transactionid IGNORE NULLS) 
OVER (PARTITION BY userid ORDER BY hitnumber ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS t_id,
  from q1
  join q2 using (session_id)
  order by 1,2,3,4
)

But the result I am getting is the inverse of what I want, populating the transaction ID of the previous order against the basket events which happened after this transaction.

How can I change my code so that I will see the transaction id of the order AFTER the basket events that led up to it? For example, in the table below I want to see the transaction id ending in ...095 instead of the id ending in ...383 for the column t_id.

enter image description here

Based on Gordon's answer below I have also tried:

last_value(transactionid ignore nulls) over(
  order by hitnumber 
  rows between unbounded preceding and current row) as t_id2,

But this is not populating the event rows which proceed a transaction with a transaction id (seen below as t_id2): enter image description here

Upvotes: 0

Views: 46

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270773

You can use last_value(ignore nulls):

select . . . ,
       last_value(transaction_id ignore nulls) over (
           order by hitnumber
           rows between unbounded preceding and current row
          ) as t_id
from q1 join
     q2 using (session_id);

The difference from your answer is the windowing clause which ends at the current row.

EDIT:

It looks like there is one t_id per session_id, so just use max():

select . . . ,
       max(transaction_id) over (partition by session_id) as t_id
from q1 join
     q2 using (session_id);

Upvotes: 1

Related Questions