titutubs
titutubs

Reputation: 365

How to join user with data from last 3 sessions in SQL?

I have a table with user purchase information. Every user can buy at any day. Now, I want to create a new column that does a self-join but uses an aggregate of only the last 3 purchases of that user. How would I do this?

user | date       | purchase_amount
1    | 2020-01-01 | 10
1    | 2020-01-04 | 4
1    | 2020-01-05 | 1
1    | 2020-02-01 | 6
2    | ....

Now, I want to add a 4th column which takes the average of the last (up to) 3 purchases PRIOR to the current date.

user | date       | purchase_amount | past_3_purchases_avg
1    | 2020-01-01 | 10              | 0 (no prior purchase)
1    | 2020-01-04 | 4               | 10 (last and only purchase is $10)
1    | 2020-01-05 | 1               | 7  (last 2 purchases are $4 and $10)
1    | 2020-02-01 | 6               | 5  (last 3 purchases $1 and $4 and $10)
1    | 2020-02-04 | 3               | 3.6 (last 3 purchases $1 and $4 and 6$)
2    | ....

With window functions and lag function this takes too long given my dataset with multi-millions of rows. What is a good self join solution?

Upvotes: 0

Views: 41

Answers (1)

Luke Kubat
Luke Kubat

Reputation: 381

I know you said windowed functions didn't work for you, but you also said you were using lead and lag, which are not really ideal.

This code should be faster than any self join:

SELECT 
    *, 
    AVG([purchase_amount]) 
    OVER(   PARTITION BY [userid]
            ORDER BY [date] DESC 
            ROWS BETWEEN 3 PRECEEDING 
            AND 1 PRECEEDING
    ) [Last3Avg]
FROM [data]

If that's not faster, I'd have a look at your indexing, cause I don't think a self join is ever going to beat that performance-wise if you've got a userid/date index that includes purchase_amount.

Upvotes: 1

Related Questions