Reputation: 2464
I have this table in my postgresql database:
purchase
userid | date | price
---------------------------
1 | 2016-01-06 | 10
1 | 2016-01-05 | 5
2 | 2016-01-06 | 12
2 | 2016-01-05 | 15
I want the sum of the last purchase price of all users. For user 1 the last purchase is on 2016-01-06 and the price is 10. For user 2 the last purchase is on 2016-01-06 and the price is 12. So the result of the SQL query should by 22
.
How can I do that in SQL ?
Upvotes: 1
Views: 361
Reputation: 2464
All proposed solutions are good and work but as my table contains millions of records I had to find the more efficient way to do what I want. And it seems that the better way is to use the foreign key between the tables purchase
and user
(which I didn't mention in my question, my apologies) which is purchase.user -> user.id
. Knowing this I can do the following request:
select sum(t.price) from (
select (select price from purchase p where p.userid = u.id order by date desc limit 1) as price
from user u
) t;
EDIT
To answer to @a_horse_with_no_name here is explain analyse verbose
for his and my solutions:
His solution:
Aggregate (cost=64032401.30..64032401.31 rows=1 width=4) (actual time=566101.129..566101.129 rows=1 loops=1)
Output: sum(purchase.price)
-> Unique (cost=62532271.89..64032271.89 rows=10353 width=16) (actual time=453849.494..566087.948 rows=12000 loops=1)
Output: purchase.userid, purchase.price, purchase.date
-> Sort (cost=62532271.89..63282271.89 rows=300000000 width=16) (actual time=453849.492..553060.789 rows=300000000 loops=1)
Output: purchase.userid, purchase.price, purchase.date
Sort Key: purchase.userid, purchase.date
Sort Method: external merge Disk: 7620904kB
-> Seq Scan on public.purchase (cost=0.00..4910829.00 rows=300000000 width=16) (actual time=0.457..278058.430 rows=300000000 loops=1)
Output: purchase.userid, purchase.price, purchase.date
Planning time: 0.076 ms
Execution time: 566433.215 ms
My solution:
Aggregate (cost=28366.33..28366.34 rows=1 width=4) (actual time=53914.690..53914.690 rows=1 loops=1)
Output: sum((SubPlan 1))
-> Seq Scan on public.user2 u (cost=0.00..185.00 rows=12000 width=4) (actual time=0.021..3.816 rows=12000 loops=1)
Output: u.id, u.name
SubPlan 1
-> Limit (cost=0.57..2.35 rows=1 width=12) (actual time=4.491..4.491 rows=1 loops=12000)
Output: p.price, p.date
-> Index Scan Backward using purchase_user_date on public.purchase p (cost=0.57..51389.67 rows=28977 width=12) (actual time=4.490..4.490 rows=1 loops=12000)
Output: p.price, p.date
Index Cond: (p.userid = u.id)
Planning time: 0.115 ms
Execution time: 53914.730 ms
My table contains 300 million of records.
I don't know if it's relevant but I also have an index on purchase (userid, date)
.
Upvotes: 1
Reputation: 175596
You can use windowed functions to get rank number and then use normal aggregation with SUM
:
WITH cte AS
(
SELECT *, RANK() OVER(PARTITION BY userid ORDER BY "date" DESC) AS r
FROM purchase
)
SELECT SUM(price) AS total
FROM cte
WHERE r = 1;
Keep in mind that this solution calculates ties. To get only one purchase per user you need a column that is distinct per group (like datetime
). But still it is possibility to get ties.
EDIT:
Handling ties:
CREATE TABLE purchase(
userid INTEGER NOT NULL
,date timestamp NOT NULL
,price INTEGER NOT NULL
);
INSERT INTO purchase(userid,date,price) VALUES
(1, timestamp'2016-01-06 12:00:00',10),
(1,timestamp'2016-01-05',5),
(2,timestamp'2016-01-06 13:00:00',12),
(2,timestamp'2016-01-05',15),
(2,timestamp'2016-01-06 13:00:00',1000)'
Note the difference RANK()
vs ROW_NUMBER
:
SqlFiddleDemo_RANK
SqlFiddleDemo_ROW_NUMBER
SqlFiddleDemo_ROW_NUMBER_2
Output:
╔════════╦══════════════╦══════════════╗
║ RANK() ║ ROW_NUMBER() ║ ROW_NUMBER() ║
╠════════╬══════════════╬══════════════╣
║ 1022 ║ 22 ║ 1010 ║
╚════════╩══════════════╩══════════════╝
Without UNIQUE
index on userid/date
there is always possibility(probably small) for tie. Any solutions that is based on ORDER BY
have to work on stable manner.
Upvotes: 4
Reputation: 36097
You can use LATERAL join in this case:
SELECT sum(price)
FROM (
select distinct userid FROM purchase
) u,
LATERAL (
SELECT price FROM purchase p
WHERE p.userid = u.userid
ORDER BY date DESC LIMIT 1
) x
demo: http://sqlfiddle.com/#!15/5569b/5
Upvotes: 1
Reputation:
To get the "latest" price you can use distinct on ()
in Postgres:
select distinct on (userid) userid, date, price
from the_table
order by userid, date desc
Now you only need to sum up all prices returned by the above statement:
select sum(price)
from (
select distinct on (userid) userid, price
from the_table
order by userid, date desc
) t;
Upvotes: 3