How to group by specify range of years and month?

Question

I have a large set of data for a range of product from two table that I want to display in a specify date range with a group by function according to the current date. Data range is:

Lets say today's date is 03/09/2012 (DD/MM/YYYY)

--Product 1--
Everything 5 years ago 'Before 2007'  {field1}   {field2}   {field3} 
4 years ago          '2008' 
3 years ago          '2009' 
2 years ago          '2010' 
1 year by month      'Jan 2011' 
'Feb 2011' 
'Mar 2011 
.... 
.... 
'Dec 2011' 
Sum of 1 year ago    '2011' 
This year by month   'Jan 2012' 
'Feb 2012' 
'Mar 2012' 
.... 
....
'Sept 2012' 
Sum of this year '2012'

The performance of this sql is important. So far I got a sql that can be further group by year or month for each product but not in above order. Im thinking using NVL, CASE and many nested sql but anyone can think of a solution that can get a good performance?

    SELECT EXTRACT (YEAR FROM {DATE}) "YEAR", EXTRACT (MONTH FROM {DATE}) "MONTH", SUM({field1}) as A, SUM({field 2}) as B ,COUNT(1) as {field 3}
    FROM (
            SELECT {Field A}, DECODE({Field Key1}, NULL, 0, 1) {field 1}, DECODE({field B}, NULL, 1, 0)  {field2}, {Field Key2}
            FROM {table A}, (
                    SELECT {field key2}
                    FROM {table B}
                    WHERE {conditions} B  
            WHERE A.KEY= B.KEY(+)
    )
    where {conditions}
    GROUP BY EXTRACT (YEAR FROM {DATE}) , EXTRACT (MONTH FROM {DATE}) 
  ) DATASET

Alex Poole · Accepted Answer

I'm a bit unclear which part you're struggling with, or why you refer to many nested SQLs. Without any sample data I've had to make some up:

create table t42 (my_date date, my_value number);

insert into t42 values (date '2006-01-31', 0601);
insert into t42 values (date '2006-12-31', 0612);
insert into t42 values (date '2007-01-31', 0701);
insert into t42 values (date '2007-12-31', 0712);
insert into t42 values (date '2008-01-31', 0801);
insert into t42 values (date '2008-12-31', 0812);
insert into t42 values (date '2009-01-31', 0901);
insert into t42 values (date '2009-12-31', 0912);
insert into t42 values (date '2010-01-31', 1001);
insert into t42 values (date '2010-12-31', 1012);
insert into t42 values (date '2011-01-31', 1101);
insert into t42 values (date '2011-12-31', 1112);
insert into t42 values (date '2012-01-31', 1201);
insert into t42 values (date '2012-02-29', 1202);
insert into t42 values (date '2012-03-31', 1203);
insert into t42 values (date '2012-04-30', 1204);
insert into t42 values (date '2012-05-31', 1205);

You can then use an inner query to produce the 'label' for the period/year/month, and a dummy field to order the results by, plus the values you're actually interested in. Then an outer query to do any sum, count etc.

select label, sum(my_value), count(1)
from
(
    select
        case when my_date < trunc(sysdate, 'YYYY') - interval '4' year then
                'Before ' || to_char(trunc(sysdate, 'YYYY')
                - interval '5' year, 'YYYY')
            when my_date < trunc(sysdate, 'YYYY') - interval '1' year then
                to_char(my_date, 'YYYY')
            else to_char(my_date, 'Mon YYYY')
        end as label,
        case when my_date < trunc(sysdate, 'YYYY') - interval '4' year then
                to_char(trunc(sysdate, 'YYYY') - interval '5' year, 'YYYYMM')
            when my_date < trunc(sysdate, 'YYYY') - interval '1' year then
                to_char(my_date, 'YYYY') || '01'
            else to_char(my_date, 'YYYYMM')
        end as order_field,
        my_value
    from t42
)
group by label, order_field
order by order_field;

I'm using trunc(sysdate, 'YYYY') to find the start of the current year, and then using interval to go back five years, and producing the label and order_field psuedo-columns base on those 'buckets'. Using case like this lets me having varying buckets - one for everything more than five years old, one for each year up to last year, and one for each month since then.

LABEL             SUM(MY_VALUE)   COUNT(1)
----------------- ------------- ----------
Before 2007                2626          4
2008                       1613          2
2009                       1813          2
2010                       2013          2
Jan 2011                   1101          1
Dec 2011                   1112          1
Jan 2012                   1201          1
Feb 2012                   1202          1
Mar 2012                   1203          1
Apr 2012                   1204          1
May 2012                   1205          1

I'm only hitting the table once, so the performance should depend on how you're extracting the raw data (your joins and conditions) rather then how you're manipulating it. Obviously you can replace t42 with your current join between your two tables, and pull out the fields you're interested in.

I'd suggest you switch to the ANSI join syntax rather than Oracle's old (+) notation for outer joins. And this doesn't deal with any years or months which don't have any data, but neither does your original outline, so that may not be an issue. And how you produce, or at least display, the 'Sum of year' values may depend on your client.

Separating out the label generation into a view to make it reusable and allow you to find periods that have no data:

create or replace view v42 (period_label, period_order, period_start, period_end)
as
select 'Before ' || to_char(trunc(sysdate, 'YYYY') - interval '5' year, 'YYYY'),
    '197001',
    date '1970-01-01',
    trunc(sysdate, 'YYYY') - interval '4' year  - interval '1' second
from dual
union
select to_char(year_start, 'YYYY'),
    to_char(year_start, 'YYYY') || '01',
    year_start,
    year_start + interval '1' year - interval '1' second
from (
    select add_months(trunc(sysdate, 'YYYY'), - 12 * (level + 1)) as year_start
    from dual connect by level <= 3
)
union
select to_char(month_start, 'Mon YYYY'),
    to_char(month_start, 'YYYYMM'),
    month_start,
    month_start + interval '1' month - interval '1' second
from (
    select add_months(trunc(sysdate, 'MM'), 1 - level) as month_start
    from dual connect by level <= 12 + to_number(to_char(sysdate, 'MM'))
);

This is to produce the labels you had originally; if you want all years shown individually remove the first part of the union, and you can adjust the connect by clauses to vary the years which show by month. (You could possibly parameterise it but that's probably going a bit far).

You had three classes of 'bucket', one class broken down by month, one by year, and then a catch-all for anything too far in the past; each part of the union addresses one of those, generating period start and end dates for all buckets within each class, plus the label and something to order by later. Look the view, and maybe each select individually, to see what they're doing.

Then join that view to your data table, or tables; with a left outer join if you want to show labels with no matching data:

select v.period_label, nvl(sum(t.my_value), 0), count(t.my_value)
from v42 v
left join t42 t on t.my_date between v.period_start and v.period_end
group by v.period_label, v.period_order
order by v.period_order;

PERIOD_LABEL      NVL(SUM(T.MY_VALUE),0) COUNT(T.MY_VALUE)
----------------- ---------------------- -----------------
Before 2007                         2626                 4
2008                                1613                 2
2009                                1813                 2
2010                                2013                 2
Jan 2011                            1101                 1
Feb 2011                               0                 0
...
Nov 2011                               0                 0
Dec 2011                            1112                 1
...

How to group by specify range of years and month?

Answers (2)

Related Questions