mimo
mimo

Reputation: 2629

Select n first elements such that accumulated sum reaches a given value

Starting from the PostgreSQL query

SELECT filename FROM files_storage ORDER BY date;

I would like to reduce the output table to the first n rows such that the accumulated sum from row 1 to n of a column called size reaches at least max_value.

Example:

date         filename   size
2016-09-01   /a/aaa/    20
2016-09-02   /a/bbb/    70
2016-09-03   /a/ccc/    20
2016-09-04   /a/ddd/    30
2016-09-05   /a/eee/    50

If max_value is 100, I want to return the first three rows because 20 + 70 + 20 >= 100.

I have seen answers here to similar questions, but nothing in PostgreSQL.

Upvotes: 2

Views: 63

Answers (2)

redneb
redneb

Reputation: 23920

Here's my take:

SELECT filename, size
FROM (
    SELECT
        filename,
        size,
        coalesce(sum(size) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS sum
    FROM files_storage
) t
WHERE sum<100

I use sum as a window function here to calculate for each file, the sum of the sizes of all previous files (not including the current one). Then I filter the rows based on whether that number is less than the threshold. Having excluded the current file from the sum ensures that we will get one row more, which is going to be the files that tips the sum over the threshold.

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1271091

Use the cumulative sum functionality:

SELECT fs.*
FROM (SELECT fs.*, SUM(size) OVER (ORDER BY date) as running_sum
      FROM files_storage
     ) fs
WHERE running_sum >= 100 AND running_sum - size < 100;

Oh, that gets the first row that crosses the boundary.

You want all of them, so instead:

SELECT fs.*
FROM (SELECT fs.*, SUM(size) OVER (ORDER BY date) as running_sum
      FROM files_storage
     ) fs
WHERE running_sum - size < 100;

If you can have duplicate dates and arbitrarily want one value when duplicate values on the same date might apply:

SELECT fs.*
FROM (SELECT fs.*,
             SUM(size) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as running_sum
      FROM files_storage
     ) fs
WHERE running_sum - size < 100;

Upvotes: 0

Related Questions