Ragav
Ragav

Reputation: 229

Group the Result based on RowCount in Oracle

I have a requirement, in which I need to output group the result based on rowCount.

Here is the result set which I get from my SQL:

ID      Date        Count
1     10/01/2013    50
1     10/02/2013    25
1     10/03/2013    100
1     10/04/2013    200 
1     10/05/2013    175
1     10/06/2013     45
2     10/01/2013     85
2     10/02/2013    100 

Can i have them as

    id        date    Count
    1     10/03/2013    175
    1     10/04/2013    200
    1     10/05/2013    175
    1     10/06/2013     45
    2     10/02/2013    185

I need to reduce the result set by grouping their count <= 200 per ID. Say for example the combined sum of 10/01, 10/02 and 10/03 came to 175 so I need to group them into one single row. Where as adding the values fir 10/05 and 10/06 would be >200, so leave them un-grouped.

Is it possible in Oracle 11g to solve this using PLSQL or SQL Analytic functions?

New REsult Set Requested Is there a way to return the result having additional Column to it? StartD column For each row it has to take the Previous End Date of that

ID      StartD      EndDate     Count
1       10/01/2013  10/03/2013   175
1       10/03/2013  10/04/2013   200
1       10/04/2013  10/05/2013   250
1       10/05/2013  10/06/2013   190
1       10/06/2013  10/08/2013    45
2       10/01/2013  10/01/2013   185

Upvotes: 2

Views: 344

Answers (3)

Sylvain Leroux
Sylvain Leroux

Reputation: 52000

For such a task, you could use a pipelined table function to generate the required result.

There is a little "plumbing" as it requires to define some additional types, but the function by itself is a simple loop over a cursor, accumulating values and generating row either on change of id, or when the accumulated total exceed the limit.

You could implement that many ways. Here, using a plain old loop, instead of a for in cursor, I obtain something not that inelegant:

CREATE OR REPLACE TYPE stuff_row AS OBJECT (
  id          int,
  stamp       date,
  last_stamp  date,
  num         int
);
CREATE OR REPLACE TYPE stuff_tbl AS TABLE OF stuff_row;
CREATE OR REPLACE FUNCTION partition_by_200
RETURN stuff_tbl PIPELINED
AS
  CURSOR data IS SELECT id, stamp, num FROM stuff ORDER BY id, stamp;
  curr data%ROWTYPE;
  acc  stuff_row := stuff_row(NULL,NULL,NULL,NULL);
BEGIN
  OPEN data;
  FETCH data INTO acc.id,acc.stamp,acc.num;
  acc.last_stamp := acc.stamp;


  IF data%FOUND THEN
  LOOP
    FETCH data INTO curr;

    IF data%NOTFOUND OR curr.id <> acc.id OR acc.num+curr.num > 200
    THEN
      PIPE ROW(stuff_row(acc.id,acc.stamp,acc.last_stamp,acc.num));
      EXIT WHEN data%NOTFOUND;

      -- reset the accumulator
      acc := stuff_row(curr.id, curr.stamp, curr.stamp, curr.num);
    ELSE
      -- accumulate value
      acc.num := acc.num + curr.num;
      acc.last_stamp := curr.stamp;
    END IF;
  END LOOP;
  END IF;

  CLOSE data;
END;

Usage:

SELECT * FROM TABLE(partition_by_200());

Using the same test data as Mat in its own answer, this produces:

ID  STAMP       LAST_STAMP  NUM
1   10/01/2013  10/03/2013  175
1   10/04/2013  10/04/2013  200
1   10/05/2013  10/05/2013  250
1   10/06/2013  10/07/2013  190
1   10/08/2013  10/08/2013  45
2   10/01/2013  10/02/2013  185

Upvotes: 1

Mat
Mat

Reputation: 206719

You can do this in Oracle 12c with a MATCH_RECOGNIZE pattern matching technique.

Setup (added a few rows, including some with a count above 200, for testing):

create table stuff (id int, stamp date, num int);
insert into stuff values (1, to_date('10/01/2013', 'MM/DD/RRRR'), 50);
insert into stuff values (1, to_date('10/02/2013', 'MM/DD/RRRR'), 25);
insert into stuff values (1, to_date('10/03/2013', 'MM/DD/RRRR'), 100);
insert into stuff values (1, to_date('10/04/2013', 'MM/DD/RRRR'), 200);
insert into stuff values (1, to_date('10/05/2013', 'MM/DD/RRRR'), 250);
insert into stuff values (1, to_date('10/06/2013', 'MM/DD/RRRR'), 175);
insert into stuff values (1, to_date('10/07/2013', 'MM/DD/RRRR'), 15);
insert into stuff values (1, to_date('10/08/2013', 'MM/DD/RRRR'), 45);
insert into stuff values (2, to_date('10/01/2013', 'MM/DD/RRRR'), 85);
insert into stuff values (2, to_date('10/02/2013', 'MM/DD/RRRR'), 100);
commit;

The query would be:

select id, first_stamp, last_stamp, partial_sum
from stuff
match_recognize (
    partition by id order by stamp
    measures
      first(a.stamp) as first_stamp
    , last(a.stamp)  as last_stamp
    , sum(a.num)     as partial_sum
    pattern (A+)
    define A as (sum(a.num) <= 200 or (count(*) = 1 and a.num > 200))
);

Which gives:

        ID FIRST_STAMP LAST_STAMP PARTIAL_SUM
---------- ----------- ---------- -----------
         1 01-OCT-13   03-OCT-13          175 
         1 04-OCT-13   04-OCT-13          200 
         1 05-OCT-13   05-OCT-13          250 
         1 06-OCT-13   07-OCT-13          190 
         1 08-OCT-13   08-OCT-13           45 
         2 01-OCT-13   02-OCT-13          185 

 6 rows selected 

How this works:

  • The pattern matching is done over the whole table, partitioned by id and ordered by timestamp.
  • The pattern A+ says we want groups of consecutive (according to the partition and order by clauses) rows that satisfy condition A.
  • The condition A is that the set satisfies:
    • The sum of num in the set is 200 or less
    • Or the set has single row with num greater than 200 (otherwise these rows never match, and aren't output).
  • The measures clause indicates what the match returns (on top of the partition key):
    • The first and last timestamps from each the group
    • The sum of num for each group

Here's an approach with a table-valued function that should work in 11g (and 10g I think). Rather inelegant, but does the job. Traverses the table in order, outputting groups whenever they're "full".

You could add a parameter for the group size too.

create or replace 
type my_row is object (id int, stamp date, num int);

create or replace 
type my_tab as table of my_row;

create or replace
  function custom_stuff_groups
    return my_tab pipelined
  as
    cur_sum number;
    cur_id  number;
    cur_dt  date;
  begin
    cur_sum := null;
    cur_id  := null;
    cur_dt  := null;
    for x in (select id, stamp, num from stuff order by id, stamp)
    loop
      if (cur_sum is null) then
        -- very first row
        cur_id      := x.id;
        cur_sum     := x.num;
      elsif (cur_id != x.id) then
        -- changed ID, so output last line for previous id and reset
        pipe row(my_row(cur_id, cur_dt, cur_sum));
        cur_id              := x.id;
        cur_sum             := x.num;
      elsif (cur_sum + x.num > 200) then
        -- same id, sum overflows.
        pipe row(my_row(cur_id, cur_dt, cur_sum));
        cur_sum := x.num;
      else
        -- same id, sum still below 200
        cur_sum := cur_sum + x.num;
      end if;
      cur_dt := x.stamp;
    end loop;
    if (cur_sum is not null) then
      -- output the last line, if any
      pipe row(my_row(cur_id, cur_dt, cur_sum));
    end if;
  end;

Use as:

select * from table(custom_stuff_groups());

Upvotes: 3

user330315
user330315

Reputation:

This returns the expected result based on your sample data. I'm not 100% sure though if it will work in all cases (and it's probably not going to be very efficient):

with summed_values as (
  select stuff.*,
         case 
             when sum(cnt) over (partition by id order by count_date) >= 200 then 1
             else 0
         end as sum_group
  from stuff
), totals as (
  select id,
         max(count_date) as last_count,
         sum(cnt) as total_count
  from summed_values
  where sum_group = 0
  group by id
  union all
  select id,
         count_date as last_count,
         sum(cnt) as total_count
  from summed_values
  where sum_group = 1
  group by id, count_date
)
select *
from totals
order by id, last_count
;

SQLFiddle example: http://sqlfiddle.com/#!4/4e0d8/1

Upvotes: 2

Related Questions