Reputation: 2629
Starting from the PostgreSQL query
SELECT filename FROM files_storage ORDER BY date;
I would like to reduce the output table to the first n rows such that the accumulated sum from row 1 to n of a column called size
reaches at least max_value
.
Example:
date filename size
2016-09-01 /a/aaa/ 20
2016-09-02 /a/bbb/ 70
2016-09-03 /a/ccc/ 20
2016-09-04 /a/ddd/ 30
2016-09-05 /a/eee/ 50
If max_value
is 100, I want to return the first three rows because 20 + 70 + 20 >= 100
.
I have seen answers here to similar questions, but nothing in PostgreSQL.
Upvotes: 2
Views: 63
Reputation: 23920
Here's my take:
SELECT filename, size
FROM (
SELECT
filename,
size,
coalesce(sum(size) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS sum
FROM files_storage
) t
WHERE sum<100
I use sum
as a window function here to calculate for each file, the sum of the sizes of all previous files (not including the current one). Then I filter the rows based on whether that number is less than the threshold. Having excluded the current file from the sum ensures that we will get one row more, which is going to be the files that tips the sum over the threshold.
Upvotes: 2
Reputation: 1271091
Use the cumulative sum functionality:
SELECT fs.*
FROM (SELECT fs.*, SUM(size) OVER (ORDER BY date) as running_sum
FROM files_storage
) fs
WHERE running_sum >= 100 AND running_sum - size < 100;
Oh, that gets the first row that crosses the boundary.
You want all of them, so instead:
SELECT fs.*
FROM (SELECT fs.*, SUM(size) OVER (ORDER BY date) as running_sum
FROM files_storage
) fs
WHERE running_sum - size < 100;
If you can have duplicate dates and arbitrarily want one value when duplicate values on the same date might apply:
SELECT fs.*
FROM (SELECT fs.*,
SUM(size) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as running_sum
FROM files_storage
) fs
WHERE running_sum - size < 100;
Upvotes: 0