Reputation: 229

Group the Result based on RowCount in Oracle

I have a requirement, in which I need to output group the result based on rowCount.

Here is the result set which I get from my SQL:

ID      Date        Count
1     10/01/2013    50
1     10/02/2013    25
1     10/03/2013    100
1     10/04/2013    200 
1     10/05/2013    175
1     10/06/2013     45
2     10/01/2013     85
2     10/02/2013    100

Can i have them as

    id        date    Count
    1     10/03/2013    175
    1     10/04/2013    200
    1     10/05/2013    175
    1     10/06/2013     45
    2     10/02/2013    185

I need to reduce the result set by grouping their count <= 200 per ID. Say for example the combined sum of 10/01, 10/02 and 10/03 came to 175 so I need to group them into one single row. Where as adding the values fir 10/05 and 10/06 would be >200, so leave them un-grouped.

Is it possible in Oracle 11g to solve this using PLSQL or SQL Analytic functions?

New REsult Set Requested Is there a way to return the result having additional Column to it? StartD column For each row it has to take the Previous End Date of that

ID      StartD      EndDate     Count
1       10/01/2013  10/03/2013   175
1       10/03/2013  10/04/2013   200
1       10/04/2013  10/05/2013   250
1       10/05/2013  10/06/2013   190
1       10/06/2013  10/08/2013    45
2       10/01/2013  10/01/2013   185

Upvotes: 2

Answers (3)

Sylvain Leroux

Reputation: 52000

For such a task, you could use a pipelined table function to generate the required result.

There is a little "plumbing" as it requires to define some additional types, but the function by itself is a simple loop over a cursor, accumulating values and generating row either on change of id, or when the accumulated total exceed the limit.

You could implement that many ways. Here, using a plain old loop, instead of a for in cursor, I obtain something not that inelegant:

CREATE OR REPLACE TYPE stuff_row AS OBJECT (
  id          int,
  stamp       date,
  last_stamp  date,
  num         int
);
CREATE OR REPLACE TYPE stuff_tbl AS TABLE OF stuff_row;

CREATE OR REPLACE FUNCTION partition_by_200
RETURN stuff_tbl PIPELINED
AS
  CURSOR data IS SELECT id, stamp, num FROM stuff ORDER BY id, stamp;
  curr data%ROWTYPE;
  acc  stuff_row := stuff_row(NULL,NULL,NULL,NULL);
BEGIN
  OPEN data;
  FETCH data INTO acc.id,acc.stamp,acc.num;
  acc.last_stamp := acc.stamp;


  IF data%FOUND THEN
  LOOP
    FETCH data INTO curr;

    IF data%NOTFOUND OR curr.id <> acc.id OR acc.num+curr.num > 200
    THEN
      PIPE ROW(stuff_row(acc.id,acc.stamp,acc.last_stamp,acc.num));
      EXIT WHEN data%NOTFOUND;

      -- reset the accumulator
      acc := stuff_row(curr.id, curr.stamp, curr.stamp, curr.num);
    ELSE
      -- accumulate value
      acc.num := acc.num + curr.num;
      acc.last_stamp := curr.stamp;
    END IF;
  END LOOP;
  END IF;

  CLOSE data;
END;

Usage:

SELECT * FROM TABLE(partition_by_200());

Using the same test data as Mat in its own answer, this produces:

ID  STAMP       LAST_STAMP  NUM
1   10/01/2013  10/03/2013  175
1   10/04/2013  10/04/2013  200
1   10/05/2013  10/05/2013  250
1   10/06/2013  10/07/2013  190
1   10/08/2013  10/08/2013  45
2   10/01/2013  10/02/2013  185

Upvotes: 1

Mat

Reputation: 206719

You can do this in Oracle 12c with a MATCH_RECOGNIZE pattern matching technique.

Setup (added a few rows, including some with a count above 200, for testing):

create table stuff (id int, stamp date, num int);
insert into stuff values (1, to_date('10/01/2013', 'MM/DD/RRRR'), 50);
insert into stuff values (1, to_date('10/02/2013', 'MM/DD/RRRR'), 25);
insert into stuff values (1, to_date('10/03/2013', 'MM/DD/RRRR'), 100);
insert into stuff values (1, to_date('10/04/2013', 'MM/DD/RRRR'), 200);
insert into stuff values (1, to_date('10/05/2013', 'MM/DD/RRRR'), 250);
insert into stuff values (1, to_date('10/06/2013', 'MM/DD/RRRR'), 175);
insert into stuff values (1, to_date('10/07/2013', 'MM/DD/RRRR'), 15);
insert into stuff values (1, to_date('10/08/2013', 'MM/DD/RRRR'), 45);
insert into stuff values (2, to_date('10/01/2013', 'MM/DD/RRRR'), 85);
insert into stuff values (2, to_date('10/02/2013', 'MM/DD/RRRR'), 100);
commit;

The query would be:

select id, first_stamp, last_stamp, partial_sum
from stuff
match_recognize (
    partition by id order by stamp
    measures
      first(a.stamp) as first_stamp
    , last(a.stamp)  as last_stamp
    , sum(a.num)     as partial_sum
    pattern (A+)
    define A as (sum(a.num) <= 200 or (count(*) = 1 and a.num > 200))
);

Which gives:

        ID FIRST_STAMP LAST_STAMP PARTIAL_SUM
---------- ----------- ---------- -----------
         1 01-OCT-13   03-OCT-13          175 
         1 04-OCT-13   04-OCT-13          200 
         1 05-OCT-13   05-OCT-13          250 
         1 06-OCT-13   07-OCT-13          190 
         1 08-OCT-13   08-OCT-13           45 
         2 01-OCT-13   02-OCT-13          185 

 6 rows selected

How this works:

The pattern matching is done over the whole table, partitioned by id and ordered by timestamp.
The pattern A+ says we want groups of consecutive (according to the partition and order by clauses) rows that satisfy condition A.
The condition A is that the set satisfies:
- The sum of num in the set is 200 or less
- Or the set has single row with num greater than 200 (otherwise these rows never match, and aren't output).
The measures clause indicates what the match returns (on top of the partition key):
- The first and last timestamps from each the group
- The sum of num for each group

Here's an approach with a table-valued function that should work in 11g (and 10g I think). Rather inelegant, but does the job. Traverses the table in order, outputting groups whenever they're "full".

You could add a parameter for the group size too.

create or replace 
type my_row is object (id int, stamp date, num int);

create or replace 
type my_tab as table of my_row;

create or replace
  function custom_stuff_groups
    return my_tab pipelined
  as
    cur_sum number;
    cur_id  number;
    cur_dt  date;
  begin
    cur_sum := null;
    cur_id  := null;
    cur_dt  := null;
    for x in (select id, stamp, num from stuff order by id, stamp)
    loop
      if (cur_sum is null) then
        -- very first row
        cur_id      := x.id;
        cur_sum     := x.num;
      elsif (cur_id != x.id) then
        -- changed ID, so output last line for previous id and reset
        pipe row(my_row(cur_id, cur_dt, cur_sum));
        cur_id              := x.id;
        cur_sum             := x.num;
      elsif (cur_sum + x.num > 200) then
        -- same id, sum overflows.
        pipe row(my_row(cur_id, cur_dt, cur_sum));
        cur_sum := x.num;
      else
        -- same id, sum still below 200
        cur_sum := cur_sum + x.num;
      end if;
      cur_dt := x.stamp;
    end loop;
    if (cur_sum is not null) then
      -- output the last line, if any
      pipe row(my_row(cur_id, cur_dt, cur_sum));
    end if;
  end;

Use as:

select * from table(custom_stuff_groups());

Upvotes: 3

user330315

Reputation:

This returns the expected result based on your sample data. I'm not 100% sure though if it will work in all cases (and it's probably not going to be very efficient):

with summed_values as (
  select stuff.*,
         case 
             when sum(cnt) over (partition by id order by count_date) >= 200 then 1
             else 0
         end as sum_group
  from stuff
), totals as (
  select id,
         max(count_date) as last_count,
         sum(cnt) as total_count
  from summed_values
  where sum_group = 0
  group by id
  union all
  select id,
         count_date as last_count,
         sum(cnt) as total_count
  from summed_values
  where sum_group = 1
  group by id, count_date
)
select *
from totals
order by id, last_count
;

SQLFiddle example: http://sqlfiddle.com/#!4/4e0d8/1

Upvotes: 2

Group the Result based on RowCount in Oracle

Answers (3)

Related Questions