Ismedriel
Ismedriel

Reputation: 89

How to avoid transition between column-organized data processing and row-organized data processing

I'm working on DB2 Blu on with column organized tables.

My dataset is the following :

Day month year value
------- -------
20200101 202001 2020 100
20200102 202001 2020 110
...
20200215 202002 2020 120

I want to aggregate by week, month and year for this result :

Id value
2020 12000
202001 4000 'january
202002 4000 'february
2020001 700 'first week of 2020

In order to do this, I also have the table d_tps

Type Id week month year
J 20200101 2020001 202001 2020
J 20200102 2020001 202001 2020
...

J 20200215 2020007 202002 2020
M 202001 null 202001 2020
M 202002 null 202002 2020
Y 2020 null null 2020

My approach is the following

select d.id, sum(value) from tab1
Inner join d_tps d
On d.id = tab1.year
Or d.id = tab1.month
Or d.id = tab1.year
group by d.id

It works and return the expected result. Unfortunately, in the query plan, the join with OR condition causes the CTQ operator to come early and most of the query (which is in reality more complex) is treated as rows instead of columns.

How can I optimize it ?

Upvotes: 1

Views: 62

Answers (2)

Mark Barinstein
Mark Barinstein

Reputation: 12339

You should use GROUP BY GROUPING SETS & GROUPING function to achieve what you want.

WITH T (day, month, year, value) AS 
(
values 
  (20200101, 202001, 2020, 100)
, (20200102, 202001, 2020, 110)
, (20200215, 202002, 2020, 120)
)
SELECT 
CASE 
  WHEN GROUPING(DAY)   = 0 THEN DAY
  WHEN GROUPING(MONTH) = 0 THEN MONTH
  WHEN GROUPING(YEAR ) = 0 THEN YEAR
END AS ID
, SUM(VALUE) AS VALUE
FROM T
GROUP BY GROUPING SETS (DAY, MONTH, YEAR);

The result is:

|ID         |VALUE      |
|-----------|-----------|
|2020       |330        |
|202001     |210        |
|202002     |120        |
|20200101   |100        |
|20200102   |110        |
|20200215   |120        |

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270713

It looks like one join condition is sufficient along with aggregation:

select d.week, sum(value)
from tab1 Inner join
     d_tps d
     On d.id = tab1.day
group by d.week

If you want to aggregate by multiple time levels, then use grouping sets:

select d.week, d.month, d.year, sum(value)
from tab1 Inner join
     d_tps d
     On d.id = tab1.day
group by grouping sets ((d.week), (d.month), (d.year))

Upvotes: 1

Related Questions