Postgres: limit by the results of a sum function

Question

CREATE TABLE inventory_box (
 box_id varchar(10),
 value   integer
);

INSERT INTO inventory_box VALUES ('1', 10), ('2', 15), ('3', 20);

I prepared a sql fiddle with the schema.

I would like to select a list of inventory boxes with combined value of above 20

possible result 1. box 1 + box 2 (10 + 15 >= 20)

Here is what I am doing right now:

SELECT * FROM inventory_box LIMIT 1 OFFSET 0;
-- count on the client side and see if I got enough
-- got 10
SELECT * FROM inventory_box LIMIT 1 OFFSET 1;
-- count on the client side and see if I got enough
-- got 15, add it to the first query which returned 10
-- total is 25, ok, got enough, return answer

I am looking for a solution where the scan will stop as soon as it reaches the target value

Erwin Brandstetter · Accepted Answer

Your question update clarifies that your actual requirements are much simpler than a full-blown "subset sum problem" as suspected by Ulrich:
Just fetch rows until the sum is big enough.

I sort by box_id to get deterministic results. You might even drop the ORDER BY altogether to get any valid result a bit faster, yet.

Slow: Recursive CTE

WITH RECURSIVE i AS (
   SELECT *, row_number() OVER (ORDER BY box_id) AS rn
   FROM   inventory_box
   )
, r AS (
   SELECT box_id, val, val AS total, 2 AS rn
   FROM   i
   WHERE  rn = 1

   UNION ALL
   SELECT i.box_id, i.val, r.total + i.val, r.rn + 1
   FROM   r
   JOIN   i USING (rn)
   WHERE  r.total < 20
   )
SELECT box_id, val, total
FROM   r
ORDER  BY box_id;

Fast: PL/pgSQL function with `FOR` loop

Using sum() as window aggregate function (cheapest).

CREATE OR REPLACE FUNCTION f_shop_for(_total int)
  RETURNS TABLE (box_id text, val int, total int)
  LANGUAGE plpgsql STABLE AS
$func$
BEGIN
   total := 0;

   FOR box_id, val, total IN
      SELECT i.box_id, i.val
           , sum(i.val) OVER (ORDER BY i.box_id) AS total
      FROM   inventory_box i
   LOOP
      RETURN NEXT;
      EXIT WHEN total >= _total;
   END LOOP; 
END
$func$;

Call:

SELECT * FROM f_shop_for(20);

I tested both with a big table of 1 million rows. The function only reads the necessary rows from index and table. The CTE is very slow, seems to scan the whole table ...

fiddle
_{OLD sqlfiddle}

Aside: sorting by a varchar column (box_id) with numeric data yields dubious results. Should be a numeric type, really?

Postgres: limit by the results of a sum function

Answers (2)

Slow: Recursive CTE

Fast: PL/pgSQL function with `FOR` loop

Related Questions

Postgres: limit by the results of a sum function

Answers (2)

Slow: Recursive CTE

Fast: PL/pgSQL function with FOR loop

Related Questions

Fast: PL/pgSQL function with `FOR` loop