zeroflaw
zeroflaw

Reputation: 564

SQL - How to do Window function if there is NULL value?

First, I have this information:

  1. Weight A
  2. Weight B
  3. Relationship B to A: 1 to Many

Such that, below result can be obtained:

A_Id Weight A Weight B B_Id
1 3 16 1
2 5 16 1
3 6 16 1
4 7 16 1
5 2 12 2
6 6 12 2

Now, adding two more columns: Sum Weight A By B_Id, Accumulative Difference (consider below as table t2)

A_Id Weight A Sum Weight A By B_Id Weight B B_Id Accumulative Diff
1 3 21 16 1 5
2 5 21 16 1 5
3 6 21 16 1 5
4 7 21 16 1 5
5 2 8 12 2 1
6 6 8 12 2 1

For example above,

  1. first row accumulative difference => 21 - 16 = 5

  2. the fifth row accumulative difference => (21 + 8) - (16 + 12) = 1

So, my objective, is to compute such 'Accumulative Difference' The entire result is to be displayed in the report.

Technically, by using 'Window Functions', this can be achieved without problem. First, I have to create 2 more columns: Accumulate Weight A By B_Id, Accumulate Weight B. Then, just find the difference between the two.

I actually need 3 more columns:

A_Id Weight A Sum Weight A By B_Id Weight B B_Id Row By B_Id Accumulate Weight A By B_Id Accumulate Weight B Accumulative Diff
1 3 21 16 1 1 21 16 5
2 5 21 16 1 2 21 16 5
3 6 21 16 1 3 21 16 5
4 7 21 16 1 4 21 16 5
5 2 8 12 2 1 29 28 1
6 6 8 12 2 2 29 28 1

sample SQL (to generatet2):

SELECT *, [Accumulate Weight A By B_Id] = SUM(WeightA) OVER (PARTITION BY ... ORDER BY B_Id), [Accumulate Weight B] = SUM(WeightB) OVER (PARTITION BY ... ORDER BY B_Id) FROM t2 -- (...) could be by date year month -- Accumulate Weight B can set to only 1st row, etc
;WITH tableA AS (
SELECT [A_Id] = 1, [Weight] = 3, [B_Id] = 1, [date] = '2021-10-01'
UNION
SELECT [A_Id] = 2, [Weight] = 5, [B_Id] = 1, [date] = '2021-10-02'
UNION
SELECT [A_Id] = 3, [Weight] = 6, [B_Id] = 1, [date] = '2021-10-03'
UNION
SELECT [A_Id] = 4, [Weight] = 7, [B_Id] = 1, [date] = '2021-10-04'
UNION
SELECT [A_Id] = 5, [Weight] = 2, [B_Id] = 2, [date] = '2021-10-05'
UNION
SELECT [A_Id] = 6, [Weight] = 6, [B_Id] = 2, [date] = '2021-10-06'
    
--Uncomment for testing NULL value
--UNION
--SELECT [A_Id] = 7, [Weight] = 9, [B_Id] = NULL, [date] = '2021-10-07'
--UNION
--SELECT [A_Id] = 8, [Weight] = 10, [B_Id] = 3, [date] = '2021-10-08'
    
),
tableB AS (
     SELECT [B_Id] = 1, [Weight] = 16, [date] = '2021-10-03'
     UNION
     SELECT [B_Id] = 2, [Weight] = 12, [date] = '2021-10-06'

    --Uncomment for testing NULL value
    --UNION
    --SELECT [B_Id] = 3, [Weight] = 8, [date] = '2021-10-08'
),
t1a AS (
    SELECT 
        [A_Id] = tableA.A_Id,
        [WeightA] = tableA.Weight,
        [WeightB] = tableB.Weight,
        [B_Id] = tableB.B_Id,
        [Row By B_Id] = ROW_NUMBER() OVER(PARTITION BY tableB.B_Id ORDER BY A_Id)
    FROM 
        tableA 
    FULL JOIN tableB ON tableA.B_Id = tableB.B_Id
),
t1b AS (
    SELECT
        *,
        [Sum Weight A By B_Id] = SUM(WeightA) OVER (ORDER BY B_Id),
        [Accumulate Weight B] = SUM(CASE WHEN [Row By B_Id] = 1 THEN WeightB ELSE 0 END) OVER (ORDER BY B_Id)
    FROM t1a
),
t2 AS (
    SELECT 
        *,
        [Accumulate Difference] = [Sum Weight A By B_Id] - [Accumulate Weight B]
    FROM t1b
)
SELECT 
    *
FROM t2

Now, the problem comes, IF one of the B_Id is NULL. (Uncomment the parts to generate NULL B_Id)

Below is my expected result, especially on the highlighted row:

A_Id Weight A Sum Weight A By B_Id Weight B B_Id Accumulate Weight A By B_Id Accumulate Weight B Accumulative Diff
1 3 21 16 1 21 16 5
2 5 21 16 1 21 16 5
3 6 21 16 1 21 16 5
4 7 21 16 1 21 16 5
5 2 8 12 2 29 28 1
6 6 8 12 2 29 28 1
7 9 9 0 NULL 38 28 10
8 7 10 8 3 48 36 12
9 3 10 8 3 48 36 12

However, with my sample query, this is not working. Instead, below appears:

current result

The NULL B_Id appears on the first row instead. (The order is messed up)

So my question, how to handle such situation? (Keep the original row as in the expected result)

As why the order is in such as way? (raised by @ThorstenKettner)

The order by default, is based B_TransactionDatetime. If B_Id is NULL, then it will be based on A_TransactionDatetime. So, I compute another column RefDateTime = COALESCE(B_TransactionDatetime, A_TransactionDatetime), and order based on that.

PS:

Inspired by @ThorstenKettner, i should use the RefDateTime in the window function, i.e:

[Sum Weight A By B_Id] = SUM(WeightA) OVER (ORDER BY RefDateTime),
[Accumulate Weight B] = SUM(CASE WHEN [Row By B_Id] = 1 THEN WeightB ELSE 0 END) OVER (ORDER BY RefDateTime)

Case closed.

Upvotes: 0

Views: 3448

Answers (3)

Thorsten Kettner
Thorsten Kettner

Reputation: 95090

You want to outer join B to A, because not every A has an associated B.

Then you look at the rows blockwise. One block is either all rows.belonging to one B or a single A row that doesn't have a B. The b_id would make a good group key for the former, while the a_id would be appropriate for the latter. For the combined key there are different options. COALESCE(b_id, a_id) isn't one of them, because we could have an a_id 1 and a b_id 1 in the result set, but don't want them in the same group. One solution is a simple COALESCE(b_id, -a_id), provided your IDs cannot be negative, of course.

Now, all your calculations are based on the aggregated groups, I.e. you are not interested in single A values when they belong to a B group. For this reason I'd aggregate immediately and only join the single A rows again at the very end.

The order of the rows is COALESCE(b_date, a_date).

 with grouped as
    (
      select
        coalesce(b.b_id, -a.a_id) as grp_id,
        max(coalesce(b.date, a.date)) as grp_date,
        coalesce(max(b.weight), 0) as b_weight,
        sum(a.weight) as a_weight
      from a
      left join b on b.b_id =a.b_id
      group by coalesce(b.b_id, -a.a_id)
    )
    , calculated as
    (
      select
        grp_id,
        grp_date,
        b_weight,
        a_weight,
        sum(a_weight - b_weight) over (order by grp_date) as running_diff
        from grouped
    )
    select *
    from calculated c
    join a on a.b_id = c.grp_id or a.a_id = -c.grp_id
    order by c.grp_date, a.date;

I hope I've got everything right. I don't have a computer at hand and am typing this on my mobile, which turns out harder than I thought :-)

Upvotes: 2

MetaData
MetaData

Reputation: 116

You will have to make changes but this should help.`

SELECT [Accumulate Weight A By B_Id] = SUM(WeightA) OVER (
        PARTITION BY...ORDER BY B_Id
        )
    ,[Accumulate Weight B] = SUM(WeightB) OVER (
        PARTITION BY...ORDER BY B_Id
        )
FROM t2
WHERE B_Id IS NOT NULL

UNION

SELECT [Accumulate Weight A By B_Id] = SUM(TAB.WeightA) OVER (
        PARTITION BY TAB.ROW_NUM ORDER BY B_Id
        )
    ,[Accumulate Weight B] = SUM(TAB.WeightB) OVER (
        PARTITION BY TAB.ROW_NUM ORDER BY B_Id
        )
FROM (
    SELECT WeightA
        ,WeightB
        ,B_Id
        ,ROW_NUMBER() OVER (
            ORDER BY B_ID
            ) AS ROW_NUM
    FROM T2
    WHERE B_ID IS NULL
    ) TAB

`

Upvotes: 0

Cetin Basoz
Cetin Basoz

Reputation: 23837

You can use coalesce().

SELECT 
 *,
 [Accumulate Weight A By B_Id] = SUM(WeightA) OVER (PARTITION BY B_id ORDER BY B_Id),
 [Accumulate Weight B] = SUM(WeightB) OVER (PARTITION BY B_id ORDER BY B_Id),
 SUM(coalesce(WeightA,0)-coalesce(WeightB,0)) OVER (PARTITION BY B_id ORDER BY B_Id) difference

FROM t2

PS: Actually your initial query looks wrong to me, if that was correct then this would do. Probably you should give sample data of A and B. To me it makes more sense to sum() before joining them at all.

Upvotes: 0

Related Questions