Reputation: 20580
CREATE TABLE inventory_box (
box_id varchar(10),
value integer
);
INSERT INTO inventory_box VALUES ('1', 10), ('2', 15), ('3', 20);
I prepared a sql fiddle with the schema.
I would like to select a list of inventory boxes with combined value of above 20
possible result 1. box 1 + box 2 (10 + 15 >= 20)
Here is what I am doing right now:
SELECT * FROM inventory_box LIMIT 1 OFFSET 0;
-- count on the client side and see if I got enough
-- got 10
SELECT * FROM inventory_box LIMIT 1 OFFSET 1;
-- count on the client side and see if I got enough
-- got 15, add it to the first query which returned 10
-- total is 25, ok, got enough, return answer
I am looking for a solution where the scan will stop as soon as it reaches the target value
Upvotes: 0
Views: 4441
Reputation: 656794
Your question update clarifies that your actual requirements are much simpler than a full-blown "subset sum problem" as suspected by Ulrich:
Just fetch rows until the sum is big enough.
I sort by box_id
to get deterministic results. You might even drop the ORDER BY
altogether to get any valid result a bit faster, yet.
WITH RECURSIVE i AS (
SELECT *, row_number() OVER (ORDER BY box_id) AS rn
FROM inventory_box
)
, r AS (
SELECT box_id, val, val AS total, 2 AS rn
FROM i
WHERE rn = 1
UNION ALL
SELECT i.box_id, i.val, r.total + i.val, r.rn + 1
FROM r
JOIN i USING (rn)
WHERE r.total < 20
)
SELECT box_id, val, total
FROM r
ORDER BY box_id;
FOR
loopUsing sum()
as window aggregate function (cheapest).
CREATE OR REPLACE FUNCTION f_shop_for(_total int)
RETURNS TABLE (box_id text, val int, total int)
LANGUAGE plpgsql STABLE AS
$func$
BEGIN
total := 0;
FOR box_id, val, total IN
SELECT i.box_id, i.val
, sum(i.val) OVER (ORDER BY i.box_id) AS total
FROM inventory_box i
LOOP
RETURN NEXT;
EXIT WHEN total >= _total;
END LOOP;
END
$func$;
Call:
SELECT * FROM f_shop_for(20);
I tested both with a big table of 1 million rows. The function only reads the necessary rows from index and table. The CTE is very slow, seems to scan the whole table ...
Aside: sorting by a varchar
column (box_id
) with numeric data yields dubious results. Should be a numeric type, really?
Upvotes: 2
Reputation: 324475
One possible approach scans the table in box_id order until the total is above 30, then returns all the previous rows plus the row that tipped the sum over the limit. Note that the scan doesn't stop when the sum is reached, it totals the whole table then goes back over the results to pick the results.
http://sqlfiddle.com/#!15/1c502/4
SELECT
array_agg(box_id ORDER BY box_id) AS box_ids,
max(boxsum) AS boxsum
FROM
(
SELECT
box_id,
sum(value) OVER (ORDER BY box_id) AS boxsum,
sum(value) OVER (ORDER BY box_id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prevboxsum
FROM
inventory_box
) x
WHERE prevboxsum < 30 OR prevboxsum IS NULL;
but really, this is going to be pretty gruesome to do in a general and reliable manner in SQL (or at all).
You can ORDER BY value ASC
instead of ORDER BY box_id
if you like; this will add boxes from the smallest to the biggest. However, this will catastrophically fail if you then remove all the small boxes from the pool and run it again, and repeat. Soon it'll just be lumping two big boxes together inefficiently.
To solve this for the general case, finding the smallest combination, is a hard optimization problem that probably benefits from imprecise sample- and probabilistic based methods.
To scan the table in order until the sum reaches the target, lock the table then use PL/PgSQL to read rows from a cursor that returns the rows in value
order plus an array_agg(box_id) OVER (ORDER BY value)
and sum(value) OVER (order by value)
. When you reach the desired sum, return the current row's array. This won't produce an optimal solution, but it'll produce a solution, and I think it'll do so without a full table scan if there's a suitable index in place.
Upvotes: 3