Simon Righley
Simon Righley

Reputation: 4969

aggregate function in recursive SQL

This question is an extended and simplified version of this question.

I have been trying to solve in SQL the following iterative equation:

U^{F,D}_{t,p} = (\sum_{D} U^{F,D}_{t-1,p} + C_{t-1,p} )*R^{F,D}_{t-1,p}

that results in:

enter image description here

The closest analogy I can think of is that U^{F,D}_{t,p} is a number of cars of brand F, of certain color (D), that a car dealer (p) has available at time t. So the above equation basically says: take the cars' units from the day before t-1 (i.e., U^{F,D}_{t-1,p}), sum over colors (\sum_{D}), then add to the sum a C value from the day before (C_{t-1,p}, whatever that is), and multiply by some other number R from the day before (R^{F,D}_{t-1,p}, whatever that is too).

Simplified problem

I have managed to solve a simplified form of the above equation, namely:

enter image description here

i.e., without the sum over cars' colors (D). The sample data, and the SQL query are in the fiddle that I link, but I paste it here for reference as well:

FULL DATA:

CREATE TABLE DYNAMICS ( T DATE, T_M1 DATE, P INTEGER, F VARCHAR(255), DELTA_F VARCHAR(255), R_T_M1 NUMBER, C_T_M1 NUMBER, U_T_M1 NUMBER, R_T NUMBER, C_T NUMBER, U_T NUMBER );  

-- DAY 1, P_1  
INSERT INTO DYNAMICS VALUES ( TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('31.12.2014','DD.MM.YYYY HH24:MI:SS'), 1,'BMW','RED', 0.5, 0.6, NULL, 0.7,0.8,100.0 );  
INSERT INTO DYNAMICS VALUES ( TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('31.12.2014','DD.MM.YYYY HH24:MI:SS'), 1,'MERCEDES','RED', 0.5, 0.6, NULL, 0.7,0.8,50.0 );  
-- DAY 1, P_2  
INSERT INTO DYNAMICS VALUES ( TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('31.12.2014','DD.MM.YYYY HH24:MI:SS'), 2,'BMW','RED', 0.5, 0.6, NULL, 0.7,0.8,10.0 );  
INSERT INTO DYNAMICS VALUES ( TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('31.12.2014','DD.MM.YYYY HH24:MI:SS'), 2,'MERCEDES','RED', 0.5, 0.6, NULL, 0.7,0.8,5.0 );  
-- DAY 2, P_1  
INSERT INTO DYNAMICS VALUES ( TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), 1,'BMW','RED', 0.7, 0.8, 100, 0.9,0.9, NULL );  
INSERT INTO DYNAMICS VALUES ( TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), 1,'MERCEDES','RED', 0.7, 0.8, 50, 0.6,0.5, NULL );  
-- DAY 2, P_2  
INSERT INTO DYNAMICS VALUES ( TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), 2,'BMW','RED', 0.7, 0.8, 10, 0.7,0.8, NULL );  
INSERT INTO DYNAMICS VALUES ( TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), 2,'MERCEDES','RED', 0.7, 0.8, 5, 0.3,0.3, NULL );  
-- DAY 3, P_1  
INSERT INTO DYNAMICS VALUES ( TO_DATE('03.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), 1,'BMW','RED', 0.9, 0.9, NULL, 0.2,0.3, NULL );  
INSERT INTO DYNAMICS VALUES ( TO_DATE('03.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), 1,'MERCEDES','RED', 0.6, 0.5, NULL, 1.7,1.8, NULL );  
-- DAY 3, P_2  
INSERT INTO DYNAMICS VALUES ( TO_DATE('03.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), 2,'BMW','RED', 0.7, 0.8, NULL, 0.2,0.3, NULL );  
INSERT INTO DYNAMICS VALUES ( TO_DATE('03.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), 2,'MERCEDES','RED', 0.3, 0.3, NULL, 0.8,0.9, NULL );  

SAMPLE DATA:

The following demonstrates example data for car dealer p=1, car model F=BMW of color D=RED (D from the mathematical equation is called DELTA in SQL). The initial condition (t=0) is here 2015-01-01. For all the days t, all the parameters at t (R_T, C_T), and t-1 (R_T_M1, C_T_M1) are given. Knowing them, the task is to compute cars' units for all the days t > t=0.

|                         T |                       T_M1 | P |   F | DELTA_F | R_T_M1 | C_T_M1 | U_T_M1 | R_T | C_T |    U_T |
|---------------------------|----------------------------|---|-----|---------|--------|--------|--------|-----|-----|--------|
| January, 01 2015 00:00:00 | December, 31 2014 00:00:00 | 1 | BMW |     RED |    0.5 |    0.6 | (null) | 0.7 | 0.8 |    100 |
| January, 02 2015 00:00:00 |  January, 01 2015 00:00:00 | 1 | BMW |     RED |    0.7 |    0.8 |    100 | 0.9 | 0.9 | (null) |
| January, 03 2015 00:00:00 |  January, 02 2015 00:00:00 | 1 | BMW |     RED |    0.9 |    0.9 | (null) | 0.2 | 0.3 | (null) |

QUERY:

In order to resolve the simplified problem, I have come up with the query in the linked fiddle that I paste here as well for reference:

-- 
-- SQL
-- T -> t 
-- T_M1 -> t-1 
-- 
WITH RECU(  T, T_M1, P, F, DELTA_F, 
            R_T_M1, C_T_M1, U_T_M1, 
            R_T, C_T, U_T ) AS (
    -- Anchor member.
    SELECT  T, T_M1, P, F, DELTA_F, 
            R_T_M1, C_T_M1, 
            U_T_M1, 
            R_T, C_T, 
            U_T
    FROM DYNAMICS 
        -- Initial condition: U_{t-1} does not exist, and U_{t=0} is given
        WHERE  ( U_T_M1 IS NULL AND U_T IS NOT NULL )
    UNION ALL
    -- Recursive member.
    SELECT  NEW.T, NEW.T_M1, NEW.P, NEW.F, NEW.DELTA_F,  
            NEW.R_T_M1, NEW.C_T_M1, 
            RECU.U_T AS U_T_M1,
            NEW.R_T, NEW.C_T, 
            -- Here the magic happens, i.e., (U_{t-1} + C_{t-1})*R_{t-1} = U_{t}
            (RECU.U_T+NEW.C_T_M1)*NEW.R_T_M1 AS U_T
    FROM DYNAMICS NEW 
    INNER JOIN RECU
    ON
        -- Translates: yesterday (t-1) of the new record equals today (t) of the parent record
        NEW.T_M1 = RECU.T AND 
        NEW.P = RECU.P AND 
        NEW.F = RECU.F AND 
        NEW.DELTA_F = RECU.DELTA_F 
)
SELECT * FROM  RECU ORDER BY P, F, T;

This query, for the example data pasted above, results in:

|                         T |                       T_M1 | P |   F | DELTA_F | R_T_M1 | C_T_M1 | U_T_M1 | R_T | C_T |    U_T |
|---------------------------|----------------------------|---|-----|---------|--------|--------|--------|-----|-----|--------|
| January, 01 2015 00:00:00 | December, 31 2014 00:00:00 | 1 | BMW |     RED |    0.5 |    0.6 | (null) | 0.7 | 0.8 |    100 |
| January, 02 2015 00:00:00 |  January, 01 2015 00:00:00 | 1 | BMW |     RED |    0.7 |    0.8 |    100 | 0.9 | 0.9 |  70.56 |
| January, 03 2015 00:00:00 |  January, 02 2015 00:00:00 | 1 | BMW |     RED |    0.9 |    0.9 |  70.56 | 0.2 | 0.3 | 64.314 |

Which works well, i.e., for: 2015-01-02, U_t = (100+0.8)*0.7 = 70.56, 2015-01-03, U_t = (70.56+0.9)*0.9 = 64.314.

The query is written in such a way that it works with different car dealers, and different car brands, which can be checked running the query in the linked fiddle

Coming back to the full problem

The query above cannot handle correctly the sum over cars' colors from the original equation:

enter image description here

This was irrelevant in the simplified data, since all cars (BMW and MERCEDES) occur there only in RED, and so the sum over colors effectively vanishes.

Such full logic should be probably implemented via a GROUP BY/SUM expression built in into the original query above. Unfortunately, I do not know how to do it.

So, imagine you have data in the shape like in the simplified problem section, but now every car brand exists in two colors, e.g., like in this linked fiddle:

|                         T |                       T_M1 | P |        F | DELTA_F | R_T_M1 | C_T_M1 | U_T_M1 | R_T | C_T |    U_T |
|---------------------------|----------------------------|---|----------|---------|--------|--------|--------|-----|-----|--------|
| January, 01 2015 00:00:00 | December, 31 2014 00:00:00 | 2 | MERCEDES |   BLACK |    0.2 |    0.6 | (null) | 0.5 | 0.8 |    5.5 |
| January, 01 2015 00:00:00 | December, 31 2014 00:00:00 | 2 | MERCEDES |     RED |    0.5 |    0.6 | (null) | 0.7 | 0.8 |      5 |
| January, 02 2015 00:00:00 |  January, 01 2015 00:00:00 | 2 | MERCEDES |   BLACK |    0.5 |    0.8 |    5.5 | 1.3 | 0.5 | (null) |
| January, 02 2015 00:00:00 |  January, 01 2015 00:00:00 | 2 | MERCEDES |     RED |    0.7 |    0.8 |      5 | 4.3 | 0.5 | (null) |
| January, 03 2015 00:00:00 |  January, 02 2015 00:00:00 | 2 | MERCEDES |   BLACK |    1.3 |    0.5 | (null) | 0.3 | 0.9 | (null) |
| January, 03 2015 00:00:00 |  January, 02 2015 00:00:00 | 2 | MERCEDES |     RED |    4.3 |    0.5 | (null) | 0.4 | 0.9 | (null) |

Given such data, you would expect for the dealer p=2 F=MERCEDES cars dynamics to look as follows:

U^{MERCEDES,BLACK}_{T=2015-01-02,P=2} = ( (5.5 + 5) + 0.8 )*0.5 = 11.3*0.5 = 5.65 
U^{MERCEDES,RED}_{T=2015-01-02,P=2} = ( (5.5 + 5) + 0.8 )*0.7 = 11.3*0.7 = 7.91

U^{MERCEDES,BLACK}_{T=2015-01-03,P=2} = ( (5.65 + 7.91) + 0.5 )*1.3 = 14.06*1.3 = 18.278
U^{MERCEDES,RED}_{T=2015-01-03,P=2} = ( (5.65 + 7.91) + 0.5 )*4.3 = 14.06*4.3 = 60.458

Question is how the simplified query above should be adjusted to solve this problem.

Upvotes: 9

Views: 174

Answers (2)

Simon Righley
Simon Righley

Reputation: 4969

The solution turned out to be easier than I had thought (although I have spent one day trying all sorts of things, and now all seems to be trivial).

The query working (tested) on the original fiddle data reads:

WITH RECU(  T, T_M1, P, F, DELTA_F, 
            R_T_M1, C_T_M1, U_T_M1, 
            R_T, C_T, U_T ) AS (
    -- Anchor member.
    SELECT  T, T_M1, P, F, DELTA_F, 
            R_T_M1, C_T_M1, 
            U_T_M1, 
            R_T, C_T, 
            U_T
    FROM DYNAMICS 
        -- Initial condition: U_{t-1} does not exist, and U_{t=0} is given
        WHERE  ( U_T_M1 IS NULL AND U_T IS NOT NULL )
    UNION ALL
    -- Recursive member.
    SELECT  NEW.T, NEW.T_M1, NEW.P, NEW.F, NEW.DELTA_F,  
            NEW.R_T_M1, NEW.C_T_M1, 
            RECU.U_T AS U_T_M1,
            NEW.R_T, NEW.C_T,
            -- Here the magic happens, i.e., (U_{t-1} + C_{t-1})*R_{t-1} = U_{t}
            ( (( SUM(RECU.U_T) OVER (PARTITION BY NEW.T, NEW.T_M1, NEW.P, NEW.F) ) + NEW.C_T_M1)*NEW.R_T_M1 ) AS U_T
    FROM DYNAMICS NEW 
    INNER JOIN RECU
    ON
        -- Translates: yesterday (t-1) of the new record equals today (t) of the parent record
        NEW.T_M1 = RECU.T AND 
        NEW.P = RECU.P AND 
        NEW.F = RECU.F AND 
        NEW.DELTA_F = RECU.DELTA_F 
)
SELECT * FROM RECU 
ORDER BY P, F, T, DELTA_F;

Which is a minimal change to the original query (only one line of the original query affected), and uses ORACLE analytic function.

Upvotes: 1

Cristian_I
Cristian_I

Reputation: 1625

I do not think this is the best answer but I think it gives you the result that you are looking for.

WITH RECU(  T, T_M1, P, F, DELTA_F, 
            R_T_M1, C_T_M1, U_T_M1, 
            R_T, C_T, U_T ) AS (
    -- Anchor member.

    SELECT  T, T_M1, P, F, DELTA_F, 
            R_T_M1, C_T_M1, 
            U_T_M1, 
            R_T, C_T, 
-- Start SUM of u_t
              (select sum(u_t) from DYNAMICS d2
               where d2.T=d1.T and d2.T_M1=d1.T_M1 and d2.P=d1.P and d2.F=d1.F
               group by T, T_M1, P, F) as u_t
-- End SUM of u_t   
    FROM DYNAMICS d1
        -- Initial condition: U_{t-1} does not exist, and U_{t=0} is given
        WHERE  ( U_T_M1 IS NULL AND U_T IS NOT NULL )
    UNION ALL
    -- Recursive member.

    SELECT  NEW.T, NEW.T_M1, NEW.P, NEW.F, NEW.DELTA_F,  
            NEW.R_T_M1, NEW.C_T_M1, 
            RECU.U_T AS U_T_M1,
            NEW.R_T, NEW.C_T
              , 
            -- Here the magic happens, i.e., (U_{t-1} + C_{t-1})*R_{t-1} = U_{t}
            (
              RECU.U_T
              +NEW.C_T_M1)*NEW.R_T_M1 AS U_T
    FROM DYNAMICS NEW 
    INNER JOIN RECU
    ON
        -- Translates: yesterday (t-1) of the new record equals today (t) of the parent record
        NEW.T_M1 = RECU.T AND 
        NEW.P = RECU.P AND 
        NEW.F = RECU.F AND 
        NEW.DELTA_F = RECU.DELTA_F 
)
SELECT * FROM  RECU ORDER BY P, F, T;

What i've added is between Start SUM of u_t and End SUM of u_t comments and here is the fiddle.

Upvotes: 2

Related Questions