Sql Query to find specific sum in rows

Here's a list of paletts generated by this query :

select STO_COD_ART PRODUCT_NAME,STO_NUMPAL PALETTE_NUMBER,STO_PNET NET_WEIGHT
from GCSTOCK
where STO_COD_ART = '739155VPFK500'
and STO_OUTDATE is null;
PRODUCT_NAME PALETTE_NUMBER NET_WEIGHT
739155VPFK500 227881 770
739155VPFK500 241691 894
739155VPFK500 246221 975
739155VPFK500 246471 861
739155VPFK500 246531 882

(derived from this image)

I'm trying to select the minimum number of rows that have a total sum sum(NET_WEIGHT) between 1600 and 1700. I have tried with the following query, but it doesn't return the expected results:

select STO_COD_ART PRODUCT_NAME,STO_NUMPAL PALETTE_NUMBER,STO_PNET NET_WEIGHT
from GCSTOCK
where STO_COD_ART = '739155VPFK500'
and STO_OUTDATE is null
group by STO_COD_ART, STO_NUMPAL, STO_PNET
having (SUM(STO_PNET) >= 1600 and SUM(STO_PNET) <= 1700);

I believe the attempt to use GROUP BY with a HAVING CLAUSE doesn't work because I need the results in the original form, not 1 record with the grouped result, and I need some way to verify that the least number of possible records was used to create the group.

The expected result would be:

PRODUCT_NAME PALETTE_NUMBER NET_WEIGHT
739155VPFK500 227881 770
739155VPFK500 241691 894

Thanks for your help

Upvotes: 0

Views: 77

Answers (1)

Chris Schaller
Chris Schaller

Reputation: 16679

The concept of picking the minimum number of pallets to satisfy a quantity is based on the idea that fetching and moving pallets around is labor/time intensive and should be minimised to achieve optimal throughout.

If this is a test question then great, this answer will work, but the concept is flawed in reality as it theoretically requires an unlimited amount of storage over the lifetime of the warehouse if it will be receiving stock as well. In reality we pick either whole pallets first (if they do not need to be re-packaged) or we pick from pallets with the lowest qty first, to maximise the use of the storage space and make room for new arrivals into the warehouse. If spoilage or age of the goods needs to be minimised, then you would pick from the oldest pallet first.

To obtain the minimum number of pallets, we really need to evaluate all or at least many combinations of pallets and then select the combination with the fewest pallets. In SQL we can use recursive CTE to build a matrix of all the possible combinations that we want to consider, then we can sort these results according to your business logic and take the first result from that list as the optimal combination.

You would then take the optimal combination and re-query from the original source to produce the results that your application is expecting.

In a procedural code process, which could be performed in C#, Python, Javascript or even in a stored procedure, we would normally use an algorithm that selects the largest pallet first, then iterate over the remaining pallets and pick the next largest pallet that still satisfies the criteria. The Recursive CTE can produce the same results but it odes so in a different way.

An MS Sql Server version of a Recursive CTE for this can be viewed here, In that example I have added some additional pallet definitions and allowed a maximum combination of 4 pallets. You may choose to allow more or less combinations as you need.

The matrix formed from the pallets assuming the optimal criteria is the combination with the largest combined weight is shown by this table:

PIndex Pallet_1 Pallet_2 Pallet_3 Pallet_4 NET_WEIGHT
2 241691 227881 (null) (null) 1664
2 246531 227881 (null) (null) 1652
2 246471 227881 (null) (null) 1631

But if the optimal criteria is to pick the oldest pallets first (and a lower pallet number indicates an older pallet) then the sequence of the items in the matrix is diffrerent: http://sqlfiddle.com/#!18/201c3/6

PIndex Pallet_1 Pallet_2 Pallet_3 Pallet_4 NET_WEIGHT
2 241691 227881 (null) (null) 1664
2 246471 227881 (null) (null) 1631
2 246531 227881 (null) (null) 1652

Subtle, and doesn't change the outcome in this particular dataset, but a dataset that has different pallets and quantities, or indeed if you run this again after removing the previous pallets, then the ordering may be important to you.

You may be able to use the first record, so TOP 1 from this set in your application, but if it was important to select other attributes from original dataset, we can simply use this output to filter (or join) the source data:

See complete fiddle: http://sqlfiddle.com/#!18/201c3/7

WITH FilteredPallets as (
    SELECT STO_COD_ART,STO_NUMPAL,STO_PNET
    FROM GCSTOCK
    WHERE STO_COD_ART = '739155VPFK500'
      AND STO_OUTDATE is null
      AND STO_PNET <= 1700
), PalletHierarchy as (
    SELECT 1 as PIndex
         , STO_NUMPAL as Pallet_1, CAST(null as int) as Pallet_2, CAST(null as int) as Pallet_3, CAST(null as int) as Pallet_4
         , STO_PNET as NET_WEIGHT, STO_PNET as Last_Weight
    FROM FilteredPallets
    UNION ALL
    SELECT 2
         , P1.Pallet_1, P2.STO_NUMPAL as Pallet_2, null, null
         , P2.STO_PNET + P1.NET_WEIGHT as NET_WEIGHT, P2.STO_PNET as Last_Weight
    FROM PalletHierarchy P1, FilteredPallets P2
    WHERE P1.PIndex = 1
      AND P2.STO_NUMPAL <> P1.Pallet_1
      AND P2.STO_PNET <= P1.Last_Weight 
      AND (P2.STO_PNET + P1.NET_WEIGHT) <= 1700
    UNION ALL
    SELECT 3
         , P1.Pallet_1, P1.Pallet_2, P2.STO_NUMPAL as Pallet_3, null
         , P2.STO_PNET + P1.NET_WEIGHT as NET_WEIGHT, P2.STO_PNET as Last_Weight
    FROM PalletHierarchy P1, FilteredPallets P2
    WHERE P1.PIndex = 2
      AND P2.STO_NUMPAL <> P1.Pallet_1 AND P2.STO_NUMPAL <> P1.Pallet_2
      AND P2.STO_PNET <= P1.Last_Weight 
      AND (P2.STO_PNET + P1.NET_WEIGHT) <= 1700
      UNION ALL
    SELECT 4
         , P1.Pallet_1, P1.Pallet_2, P1.Pallet_3, P2.STO_NUMPAL as Pallet_4
         , P2.STO_PNET + P1.NET_WEIGHT as NET_WEIGHT, P2.STO_PNET as Last_Weight
    FROM PalletHierarchy P1, FilteredPallets P2
    WHERE P1.PIndex = 3
      AND P2.STO_NUMPAL <> P1.Pallet_1 AND P2.STO_NUMPAL <> P1.Pallet_2 AND P2.STO_NUMPAL <> P1.Pallet_3
      AND P2.STO_PNET <= P1.Last_Weight 
      AND (P2.STO_PNET + P1.NET_WEIGHT) <= 1700
 ), Optimal as (
    SELECT TOP 1 * FROM PalletHierarchy   
    WHERE NET_WEIGHT >= 1600
    --ORDER BY PIndex ASC, NET_WEIGHT DESC
    -- sort by age
    ORDER BY PIndex ASC, Pallet_1, Pallet_2, Pallet_3, Pallet_4
)
SELECT STO_COD_ART as PRODUCT_NAME,STO_NUMPAL as PALETTE_NUMBER,STO_PNET as NET_WEIGHT
FROM GCSTOCK
INNER JOIN Optimal P1 ON P1.Pallet_1 = GCSTOCK.STO_NUMPAL 
                      OR P1.Pallet_2 = GCSTOCK.STO_NUMPAL 
                      OR P1.Pallet_3 = GCSTOCK.STO_NUMPAL 
                      OR P1.Pallet_4 = GCSTOCK.STO_NUMPAL  

Results:

PRODUCT_NAME PALETTE_NUMBER NET_WEIGHT
739155VPFK500 227881 770
739155VPFK500 241691 894

Upvotes: 1

Related Questions