spinjector
spinjector

Reputation: 3515

In DB2 SQL, how can I return a variable number of rows with FETCH FIRST (n) ROWS ONLY..?

How can I return a variable (n) number of rows with FETCH FIRST (n) ROWS ONLY in DB2 SQL..? This is for DB2/400 v7r3.

The documentation for FETCH does clearly state it cannot be done, but I cannot imagine another way to do it...

...fetch-row-count must not contain a scalar-fullselect, a column reference, a table reference, a user-defined function reference, or a built-in scalar function...

...which I take as meaning it must be a constant such as "10", and cannot be a variable or column name, which unfortunately is exactly what I would like to do.

This works:

SELECT      PREFILTER.*
FROM        PREFILTER
INNER JOIN  GT1 ON FILTERED.GTAMT=GT1.LOSTAMT
ORDER BY    GTDATE DESC
FETCH FIRST 10 ROWS ONLY

This does not work:

SELECT      PREFILTER.*
FROM        PREFILTER
INNER JOIN  GT1 ON PREFILTER.GTAMT=GT1.LOSTAMT
ORDER BY    GTDATE DESC
FETCH FIRST (GT1.LOSTAMT) ROWS ONLY    <=== changed here

The following error is received:

SQL Error [428H7]: [SQ20467] Expression containing LOSTAMT must calculate a constant value.

This query is intended to find financial transaction records which are orphaned or keyed incorrectly in a larger table with rather loose normalization & referential integrity (or lack thereof).

PREFILTER is a query which returns a subset of the main table, and GT1 is another query which calculates a smaller & more complex subset of those records. Then JOIN them, and return (n) rows as specified by the column LOSTQTY, in descending date order. So it should only return the (n) most recent records.

Note that I do recognize my placement of FETCH is incorrect, and it (or whatever it morphs into) would likely have to move into one of the CTE queries such as GT1.

Also, at this early stage, it seems apparent I'm getting some Cartesian results, but that will likely be settled once the FETCH issue is resolved.

For reference, here is the full SQL of the project:

WITH  --SET THE INITIAL ACCOUNT & DATE RANGE  
        PREFILTER    AS  (   SELECT      *
                            FROM        GLTRANT
                            WHERE       GTDATE > 20170000
                            AND         GTACCT=112068
                        ),
      --CREATE LIST OF ALL POSITIVE VALUES
        POSVALS     AS  (   SELECT      GTAMT      AS POSAMT, COUNT(GTAMT) AS POSC
                            FROM        PREFILTER 
                            WHERE       GTAMT > 0
                            GROUP BY    GTAMT
                        ),
      --CREATE LIST OF ALL NEGATIVE VALUES, WITH SIGN DROPPED
        NEGVALS     AS  (   SELECT      ABS(GTAMT) AS NEGAMT, COUNT(GTAMT) AS NEGC
                            FROM        PREFILTER 
                            WHERE       GTAMT < 0
                            GROUP BY    ABS(GTAMT)
                        ),
      --CALCULATE DISCREPANCIES BETWEEN THE TWO LISTS. SUBTRACT THE TWO AND MULTIPLY THE SIGN BY THE ABSOLUTE VALUE 
      --OF THE DIFFERENCE. THEN TO RESTORE THE SIGN, MULTIPLY THE AMOUNT BY THE SIGN OF THE DIFFERENCE. 
        FOJ         AS  (   SELECT      SIGN(COALESCE(POSC,0)-COALESCE(NEGC,0))*COALESCE(POSAMT,NEGAMT) AS LOSTAMT, 
                                        ABS (COALESCE(POSC,0)-COALESCE(NEGC,0))                         AS LOSTQTY
                            FROM        POSVALS 
                            FULL OUTER JOIN NEGVALS ON POSAMT=NEGAMT
                            WHERE       COALESCE(POSC,0)-COALESCE(NEGC,0) <> 0
                        ),
      --GET DISCREPANCIES WITH COUNT >1 
        GT1         AS  (   SELECT      *
                            FROM        FOJ
                            WHERE       LOSTQTY>1 
                        )

--SEARCH PREFILTER FOR EACH AMOUNT (LOSTAMT) IN GT1 AND RETURN THE MOST RECENT (LOSTQTY) RECORDS 
SELECT      PREFILTER.*
FROM        PREFILTER
INNER JOIN  GT1 ON PREFILTER.GTAMT=GT1.LOSTAMT
ORDER BY    GTDATE DESC
FETCH FIRST (GT1.LOSTQTY) ROWS ONLY --DOES NOT WORK

Upvotes: 0

Views: 2429

Answers (4)

WarrenT
WarrenT

Reputation: 4532

The reference page does indeed say that you cannot use a column reference, as you attempted to do, as the expression of the number of rows to return.

What it does allow is an expression. There are rules as to what it says is not allowed, but those rules do not exclude the use of a host variable.

    FETCH FIRST :xqty ROWS ONLY

Upvotes: 0

spinjector
spinjector

Reputation: 3515

Ok, after absorbing Mustaccio's previous answer and banging my head on the IBM documentation all day, I was able to work this out. Much thanks to Mustaccio for pointing me in the right direction.

Two important bits: the location of the ROW_NUMBER() had to move further down; it didn't work for this where it was. Also, it needed the addition of the PARTITION BY parameter so the numbering restarts with each new group of values. So it goes something like 1-2-1-2-1-2-3 instead of 1-2-3-4-5-6-7. This is critical for WHERE RN <= LOSTQTY to work.

Here's the full result:

WITH  --SET THE INITIAL ACCOUNT & DATE RANGE  
        PREFILTER   AS  (   SELECT  GTCOMP, GTACCT, GTDATE, GTSRCE, GTREF#, GTENT#, GTAMT, GTDESC, "GTPO#", 
                                    "GTCHK#", "GTINV#", GTCKAC, GT1099, GTXXX1, GTAFLG, GTVEND, "GTBAT#"
                            FROM        F_CERTOB.GLTRANT
                            WHERE       GTDATE > 20180000
                            AND         GTACCT=112068
                        ),
      --CREATE LIST OF ALL POSITIVE VALUES
        POSVALS     AS  (   SELECT      GTAMT      AS POSAMT, COUNT(GTAMT) AS POSC
                            FROM        PREFILTER 
                            WHERE       GTAMT > 0
                            GROUP BY    GTAMT
                        ),
      --CREATE LIST OF ALL NEGATIVE VALUES, WITH SIGN DROPPED
        NEGVALS     AS  (   SELECT      ABS(GTAMT) AS NEGAMT, COUNT(GTAMT) AS NEGC
                            FROM        PREFILTER 
                            WHERE       GTAMT < 0
                            GROUP BY    ABS(GTAMT)
                        ),
      --CALCULATE DISCREPANCIES BETWEEN THE TWO LISTS. SUBTRACT THE "DE-SIGNED" NEGATIVE FROM THE POSITIVE AND 
      --MULTIPLY THE SIGN BY THE ABSOLUTE VALUE OF THE DIFFERENCE. THEN TO RESTORE THE SIGN, MULTIPLY THE AMOUNT 
      --BY THE SIGN OF THE DIFFERENCE. THIS IS A FULL OUTER JOIN, SO NULLS ARE A GIVEN, AND COALESCE() IS USED 
      --TO FILL IN THE HOLES.  
        FOJ         AS  (   SELECT      SIGN(COALESCE(POSC,0)-COALESCE(NEGC,0))*COALESCE(POSAMT,NEGAMT) AS LOSTAMT, 
                                        ABS (COALESCE(POSC,0)-COALESCE(NEGC,0))                         AS LOSTQTY
                            FROM        POSVALS 
                            FULL OUTER JOIN NEGVALS ON POSAMT=NEGAMT
                            WHERE       COALESCE(POSC,0)-COALESCE(NEGC,0) <> 0
                        ),
      --THIS IS AN EXTRA KNOB TO CONTROL THE NUMBER OF RESULTS DURING DEVELOPMENT. IF SET TO >1, IT WILL SHOW
      --ONLY THE DATA THAT WOULD RETURN TWO OR MORE ROWS. USEFUL WHEN 99% OF THE DATA WOULD BE ONE ROW. THIS
      --WAS NEEDED TO DEVELEOP & TEST "ROW_NUMBER() OVER (PARTITION BY GTAMT ORDER BY GTDATE DESC) AS RN" AND
      --IF THE NUMBERING WAS RESTARTING PROPERLY FOR EACH GROUP OF VALUES.
        GT          AS  (   SELECT      *
                            FROM        FOJ
                            WHERE       LOSTQTY>0
                        ),
      --RETRIEVE THE ITEMS, RANK THEM BY AMT & DATE. USE PARTITON-BY TO RESTART ROW NUMBERING FOR EACH GROUP 
      --OF AMOUNTS. USE ORDER-BY TO NUMBER BY DATE IN DESCENDING ORDER. 
        GROUPED     AS  (   SELECT      PREFILTER.*, LOSTQTY, 
                                        ROW_NUMBER() OVER (PARTITION BY GTAMT ORDER BY GTDATE DESC) AS RN
                            FROM        PREFILTER
                            INNER JOIN  GT ON GTAMT=LOSTAMT
                        ),
      --NARROW IT DOWN TO ONLY THE TOP (n) ITEMS
        RECENT      AS  (   SELECT      *
                            FROM        GROUPED
                            WHERE       RN <= LOSTQTY
                        )

SELECT      *
FROM        RECENT
ORDER BY    GTAMT

Upvotes: 0

mustaccio
mustaccio

Reputation: 18945

You should be able to use a window function ROW_NUMBER for this:

WITH  --SET THE INITIAL ACCOUNT & DATE RANGE  
        PREFILTER    AS  (  
          SELECT  T.*,
                  ROW_NUMBER() OVER (ORDER BY T.GTDATE DESC) AS RN
          FROM (
                            SELECT      *
                            FROM        GLTRANT
                            WHERE       GTDATE > 20170000
                            AND         GTACCT=112068
          ) T
        ),
...
--SEARCH PREFILTER FOR EACH AMOUNT (LOSTAMT) IN GT1 AND RETURN THE MOST RECENT (LOSTQTY) RECORDS 
SELECT      PREFILTER.*
FROM        PREFILTER
INNER JOIN  GT1 ON PREFILTER.GTAMT=GT1.LOSTAMT
WHERE RN <= GT1.LOSTQTY
ORDER BY    GTDATE DESC

It assigns consecutive numbers for each row in PREFILTER in the descending order of GTDATE, which you can later use to limit the result set.

Upvotes: 4

Player1st
Player1st

Reputation: 1605

I'm not sure you can do what you want in a single statement. However, you should be able to put together a stored procedure or SQL function that can do this instead. In the stored procedure, put together the query and then open it with a cursor. The cursor can then fetch the number of rows you want, put that into a result set, and return that result set from the procedure. Current documentation from IBM on that is here: https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_73/sqlp/rbafyresultsets.htm

Upvotes: 1

Related Questions