cheb1k4
cheb1k4

Reputation: 2464

Sum of last value from users

I have this table in my postgresql database:

purchase

userid |    date    | price
---------------------------
     1 | 2016-01-06 |    10
     1 | 2016-01-05 |     5
     2 | 2016-01-06 |    12
     2 | 2016-01-05 |    15

I want the sum of the last purchase price of all users. For user 1 the last purchase is on 2016-01-06 and the price is 10. For user 2 the last purchase is on 2016-01-06 and the price is 12. So the result of the SQL query should by 22.

How can I do that in SQL ?

Upvotes: 1

Views: 361

Answers (4)

cheb1k4
cheb1k4

Reputation: 2464

All proposed solutions are good and work but as my table contains millions of records I had to find the more efficient way to do what I want. And it seems that the better way is to use the foreign key between the tables purchase and user (which I didn't mention in my question, my apologies) which is purchase.user -> user.id. Knowing this I can do the following request:

select sum(t.price) from (
    select (select price from purchase p where p.userid = u.id order by date desc limit 1) as price 
    from user u
) t; 

EDIT

To answer to @a_horse_with_no_name here is explain analyse verbose for his and my solutions:
His solution:

Aggregate  (cost=64032401.30..64032401.31 rows=1 width=4) (actual time=566101.129..566101.129 rows=1 loops=1)
    Output: sum(purchase.price)
    ->  Unique  (cost=62532271.89..64032271.89 rows=10353 width=16) (actual time=453849.494..566087.948 rows=12000 loops=1)
          Output: purchase.userid, purchase.price, purchase.date
          ->  Sort  (cost=62532271.89..63282271.89 rows=300000000 width=16) (actual time=453849.492..553060.789 rows=300000000 loops=1)
                Output: purchase.userid, purchase.price, purchase.date
                Sort Key: purchase.userid, purchase.date
                Sort Method: external merge  Disk: 7620904kB
                ->  Seq Scan on public.purchase  (cost=0.00..4910829.00 rows=300000000 width=16) (actual time=0.457..278058.430 rows=300000000 loops=1)
                      Output: purchase.userid, purchase.price, purchase.date
Planning time: 0.076 ms
Execution time: 566433.215 ms

My solution:

Aggregate  (cost=28366.33..28366.34 rows=1 width=4) (actual time=53914.690..53914.690 rows=1 loops=1)
    Output: sum((SubPlan 1))
    ->  Seq Scan on public.user2 u  (cost=0.00..185.00 rows=12000 width=4) (actual time=0.021..3.816 rows=12000 loops=1)
          Output: u.id, u.name
    SubPlan 1
      ->  Limit  (cost=0.57..2.35 rows=1 width=12) (actual time=4.491..4.491 rows=1 loops=12000)
            Output: p.price, p.date
            ->  Index Scan Backward using purchase_user_date on public.purchase p  (cost=0.57..51389.67 rows=28977 width=12) (actual time=4.490..4.490 rows=1 loops=12000)
                  Output: p.price, p.date
                  Index Cond: (p.userid = u.id)
Planning time: 0.115 ms
Execution time: 53914.730 ms

My table contains 300 million of records.
I don't know if it's relevant but I also have an index on purchase (userid, date).

Upvotes: 1

Lukasz Szozda
Lukasz Szozda

Reputation: 175596

You can use windowed functions to get rank number and then use normal aggregation with SUM:

WITH cte AS
(
   SELECT *, RANK() OVER(PARTITION BY userid ORDER BY "date" DESC) AS r
   FROM purchase
)
SELECT SUM(price) AS total
FROM cte
WHERE r = 1;

SqlFiddleDemo

Keep in mind that this solution calculates ties. To get only one purchase per user you need a column that is distinct per group (like datetime). But still it is possibility to get ties.

EDIT:

Handling ties:

CREATE TABLE purchase(
   userid INTEGER  NOT NULL 
  ,date   timestamp  NOT NULL
  ,price  INTEGER  NOT NULL
);
INSERT INTO purchase(userid,date,price) VALUES 
(1, timestamp'2016-01-06 12:00:00',10),
(1,timestamp'2016-01-05',5),
(2,timestamp'2016-01-06 13:00:00',12),
(2,timestamp'2016-01-05',15),
(2,timestamp'2016-01-06 13:00:00',1000)'

Note the difference RANK() vs ROW_NUMBER:

SqlFiddleDemo_RANK SqlFiddleDemo_ROW_NUMBER SqlFiddleDemo_ROW_NUMBER_2

Output:

╔════════╦══════════════╦══════════════╗
║ RANK() ║ ROW_NUMBER() ║ ROW_NUMBER() ║
╠════════╬══════════════╬══════════════╣
║   1022 ║           22 ║         1010 ║
╚════════╩══════════════╩══════════════╝

Without UNIQUE index on userid/date there is always possibility(probably small) for tie. Any solutions that is based on ORDER BY have to work on stable manner.

Upvotes: 4

krokodilko
krokodilko

Reputation: 36097

You can use LATERAL join in this case:

SELECT sum(price) 
FROM (
  select distinct userid FROM purchase
) u, 
LATERAL (
  SELECT price FROM purchase p
  WHERE p.userid = u.userid
  ORDER BY date DESC LIMIT 1
) x

demo: http://sqlfiddle.com/#!15/5569b/5

Upvotes: 1

user330315
user330315

Reputation:

To get the "latest" price you can use distinct on () in Postgres:

select distinct on (userid) userid, date, price
from the_table
order by userid, date desc

Now you only need to sum up all prices returned by the above statement:

select sum(price)
from (
   select distinct on (userid) userid, price
   from the_table
   order by userid, date desc
) t;

Upvotes: 3

Related Questions