Reputation: 564
First, I have this information:
Such that, below result can be obtained:
A_Id | Weight A | Weight B | B_Id |
---|---|---|---|
1 | 3 | 16 | 1 |
2 | 5 | 16 | 1 |
3 | 6 | 16 | 1 |
4 | 7 | 16 | 1 |
5 | 2 | 12 | 2 |
6 | 6 | 12 | 2 |
Now, adding two more columns: Sum Weight A By B_Id
, Accumulative Difference
(consider below as table t2
)
A_Id | Weight A | Sum Weight A By B_Id | Weight B | B_Id | Accumulative Diff |
---|---|---|---|---|---|
1 | 3 | 21 | 16 | 1 | 5 |
2 | 5 | 21 | 16 | 1 | 5 |
3 | 6 | 21 | 16 | 1 | 5 |
4 | 7 | 21 | 16 | 1 | 5 |
5 | 2 | 8 | 12 | 2 | 1 |
6 | 6 | 8 | 12 | 2 | 1 |
For example above,
first row accumulative difference => 21 - 16 = 5
the fifth row accumulative difference => (21 + 8) - (16 + 12) = 1
So, my objective, is to compute such 'Accumulative Difference
' The entire result is to be displayed in the report.
Technically, by using 'Window Functions', this can be achieved without problem.
First, I have to create 2 more columns: Accumulate Weight A By B_Id
, Accumulate Weight B
. Then, just find the difference between the two.
I actually need 3 more columns:
A_Id | Weight A | Sum Weight A By B_Id | Weight B | B_Id | Row By B_Id | Accumulate Weight A By B_Id | Accumulate Weight B | Accumulative Diff |
---|---|---|---|---|---|---|---|---|
1 | 3 | 21 | 16 | 1 | 1 | 21 | 16 | 5 |
2 | 5 | 21 | 16 | 1 | 2 | 21 | 16 | 5 |
3 | 6 | 21 | 16 | 1 | 3 | 21 | 16 | 5 |
4 | 7 | 21 | 16 | 1 | 4 | 21 | 16 | 5 |
5 | 2 | 8 | 12 | 2 | 1 | 29 | 28 | 1 |
6 | 6 | 8 | 12 | 2 | 2 | 29 | 28 | 1 |
sample SQL (to generatet2
):
;WITH tableA AS (
SELECT [A_Id] = 1, [Weight] = 3, [B_Id] = 1, [date] = '2021-10-01'
UNION
SELECT [A_Id] = 2, [Weight] = 5, [B_Id] = 1, [date] = '2021-10-02'
UNION
SELECT [A_Id] = 3, [Weight] = 6, [B_Id] = 1, [date] = '2021-10-03'
UNION
SELECT [A_Id] = 4, [Weight] = 7, [B_Id] = 1, [date] = '2021-10-04'
UNION
SELECT [A_Id] = 5, [Weight] = 2, [B_Id] = 2, [date] = '2021-10-05'
UNION
SELECT [A_Id] = 6, [Weight] = 6, [B_Id] = 2, [date] = '2021-10-06'
--Uncomment for testing NULL value
--UNION
--SELECT [A_Id] = 7, [Weight] = 9, [B_Id] = NULL, [date] = '2021-10-07'
--UNION
--SELECT [A_Id] = 8, [Weight] = 10, [B_Id] = 3, [date] = '2021-10-08'
),
tableB AS (
SELECT [B_Id] = 1, [Weight] = 16, [date] = '2021-10-03'
UNION
SELECT [B_Id] = 2, [Weight] = 12, [date] = '2021-10-06'
--Uncomment for testing NULL value
--UNION
--SELECT [B_Id] = 3, [Weight] = 8, [date] = '2021-10-08'
),
t1a AS (
SELECT
[A_Id] = tableA.A_Id,
[WeightA] = tableA.Weight,
[WeightB] = tableB.Weight,
[B_Id] = tableB.B_Id,
[Row By B_Id] = ROW_NUMBER() OVER(PARTITION BY tableB.B_Id ORDER BY A_Id)
FROM
tableA
FULL JOIN tableB ON tableA.B_Id = tableB.B_Id
),
t1b AS (
SELECT
*,
[Sum Weight A By B_Id] = SUM(WeightA) OVER (ORDER BY B_Id),
[Accumulate Weight B] = SUM(CASE WHEN [Row By B_Id] = 1 THEN WeightB ELSE 0 END) OVER (ORDER BY B_Id)
FROM t1a
),
t2 AS (
SELECT
*,
[Accumulate Difference] = [Sum Weight A By B_Id] - [Accumulate Weight B]
FROM t1b
)
SELECT
*
FROM t2
Now, the problem comes, IF one of the B_Id
is NULL
. (Uncomment the parts to generate NULL B_Id)
Below is my expected result, especially on the highlighted row:
A_Id | Weight A | Sum Weight A By B_Id | Weight B | B_Id | Accumulate Weight A By B_Id | Accumulate Weight B | Accumulative Diff |
---|---|---|---|---|---|---|---|
1 | 3 | 21 | 16 | 1 | 21 | 16 | 5 |
2 | 5 | 21 | 16 | 1 | 21 | 16 | 5 |
3 | 6 | 21 | 16 | 1 | 21 | 16 | 5 |
4 | 7 | 21 | 16 | 1 | 21 | 16 | 5 |
5 | 2 | 8 | 12 | 2 | 29 | 28 | 1 |
6 | 6 | 8 | 12 | 2 | 29 | 28 | 1 |
7 | 9 | 9 | 0 | NULL | 38 | 28 | 10 |
8 | 7 | 10 | 8 | 3 | 48 | 36 | 12 |
9 | 3 | 10 | 8 | 3 | 48 | 36 | 12 |
However, with my sample query, this is not working. Instead, below appears:
The NULL B_Id appears on the first row instead. (The order is messed up)
So my question, how to handle such situation? (Keep the original row as in the expected result)
As why the order is in such as way? (raised by @ThorstenKettner)
The order by default, is based B_TransactionDatetime
. If B_Id
is NULL, then it will be based on A_TransactionDatetime
. So, I compute another column RefDateTime = COALESCE(B_TransactionDatetime, A_TransactionDatetime)
, and order based on that.
PS:
Inspired by @ThorstenKettner, i should use the RefDateTime
in the window function, i.e:
[Sum Weight A By B_Id] = SUM(WeightA) OVER (ORDER BY RefDateTime),
[Accumulate Weight B] = SUM(CASE WHEN [Row By B_Id] = 1 THEN WeightB ELSE 0 END) OVER (ORDER BY RefDateTime)
Case closed.
Upvotes: 0
Views: 3448
Reputation: 95090
You want to outer join B to A, because not every A has an associated B.
Then you look at the rows blockwise. One block is either all rows.belonging to one B or a single A row that doesn't have a B. The b_id would make a good group key for the former, while the a_id would be appropriate for the latter. For the combined key there are different options. COALESCE(b_id, a_id)
isn't one of them, because we could have an a_id 1 and a b_id 1 in the result set, but don't want them in the same group. One solution is a simple COALESCE(b_id, -a_id)
, provided your IDs cannot be negative, of course.
Now, all your calculations are based on the aggregated groups, I.e. you are not interested in single A values when they belong to a B group. For this reason I'd aggregate immediately and only join the single A rows again at the very end.
The order of the rows is COALESCE(b_date, a_date)
.
with grouped as
(
select
coalesce(b.b_id, -a.a_id) as grp_id,
max(coalesce(b.date, a.date)) as grp_date,
coalesce(max(b.weight), 0) as b_weight,
sum(a.weight) as a_weight
from a
left join b on b.b_id =a.b_id
group by coalesce(b.b_id, -a.a_id)
)
, calculated as
(
select
grp_id,
grp_date,
b_weight,
a_weight,
sum(a_weight - b_weight) over (order by grp_date) as running_diff
from grouped
)
select *
from calculated c
join a on a.b_id = c.grp_id or a.a_id = -c.grp_id
order by c.grp_date, a.date;
I hope I've got everything right. I don't have a computer at hand and am typing this on my mobile, which turns out harder than I thought :-)
Upvotes: 2
Reputation: 116
You will have to make changes but this should help.`
SELECT [Accumulate Weight A By B_Id] = SUM(WeightA) OVER (
PARTITION BY...ORDER BY B_Id
)
,[Accumulate Weight B] = SUM(WeightB) OVER (
PARTITION BY...ORDER BY B_Id
)
FROM t2
WHERE B_Id IS NOT NULL
UNION
SELECT [Accumulate Weight A By B_Id] = SUM(TAB.WeightA) OVER (
PARTITION BY TAB.ROW_NUM ORDER BY B_Id
)
,[Accumulate Weight B] = SUM(TAB.WeightB) OVER (
PARTITION BY TAB.ROW_NUM ORDER BY B_Id
)
FROM (
SELECT WeightA
,WeightB
,B_Id
,ROW_NUMBER() OVER (
ORDER BY B_ID
) AS ROW_NUM
FROM T2
WHERE B_ID IS NULL
) TAB
`
Upvotes: 0
Reputation: 23837
You can use coalesce().
SELECT
*,
[Accumulate Weight A By B_Id] = SUM(WeightA) OVER (PARTITION BY B_id ORDER BY B_Id),
[Accumulate Weight B] = SUM(WeightB) OVER (PARTITION BY B_id ORDER BY B_Id),
SUM(coalesce(WeightA,0)-coalesce(WeightB,0)) OVER (PARTITION BY B_id ORDER BY B_Id) difference
FROM t2
PS: Actually your initial query looks wrong to me, if that was correct then this would do. Probably you should give sample data of A and B. To me it makes more sense to sum() before joining them at all.
Upvotes: 0