Reputation: 413

split records into buckets based on a sum of counts

i have a table that looks like below. i need to find a way to pick out phone numbers based on a sum of counts (the number will always be different but let's use 130 for this example).

So one of the solutions would be rows 1 through 5 and 11 (if you add up CountOfPeople values from those rows you will get 130). or 1-4,6,7,9,11,12. it doesn't matter which phone numbers are picked, as long as the total is 130.

sometimes you might not be able to get exactly 130, so "as close as possible but not exceeding" would be the rule.

is there a way to do this?

AutoID  Phone Number    Count Of People
1   5565787 57
2   2342343 30
3   2654456 17
4   3868556 12
5   9856756 12
6   9756456 4
7   4346365 4
8   2376743 3
9   9756343 3
10  2524349 3
11  2029393 2
12  9285656 1

Upvotes: 3

Answers (3)

peter.hrasko.sk

Reputation: 4141

For the "first bucket" solution this is a nice exercise in recursive subquery factoring. The following query gives you such a bucket (although with phone numbers concatenated to a single string):

with source$ as (
    select 1 as AutoID, '5565787' as Phone_Number, 12 as Count_Of_People from dual union all
    select 2, '2342343', 3 from dual union all
    select 3, '2654456', 1 from dual union all
    select 4, '3868556', 12 from dual union all
    select 5, '9856756', 4 from dual union all
    select 6, '9756456', 4 from dual union all
    select 7, '4346365', 57 from dual union all
    select 8, '2376743', 3 from dual union all
    select 9, '9756343', 3 from dual union all
    select 10, '2524349', 30 from dual union all
    select 11, '2029393', 2 from dual union all
    select 12, '9285656', 17 from dual
),
permutator$ (autoid, phone_number, count_of_people, autoid_list, phone_number_list, count_of_people_sum, count_of_people_list) as (
    select S.autoid, phone_number, count_of_people,
        to_char(autoid), cast(phone_number as varchar2(4000)), count_of_people, to_char(count_of_people)
    from source$ S
    union all
    select S.autoid, S.phone_number, S.count_of_people,
        P.autoid_list||'|'||S.autoid, P.phone_number_list||'|'||S.phone_number, P.count_of_people_sum + S.count_of_people, P.count_of_people_list||'+'||S.count_of_people
    from permutator$ P
        join source$ S
            on S.autoid > P.autoid
    where P.count_of_people_sum + S.count_of_people <= 130
)
search depth first by autoid asc set siblings_order$,
priority_ordered$ as (
    select P.*,
        row_number() over (partition by null order by abs(count_of_people_sum-130), siblings_order$ asc) as your_best_call$
    from permutator$ P
)
select autoid_list, phone_number_list, count_of_people_sum, count_of_people_list
from priority_ordered$
where your_best_call$ = 1
;

... and if you rather want a row-by-row list of the original items, then replace the last ...

select autoid_list, phone_number_list, count_of_people_sum, count_of_people_list
from priority_ordered$
where your_best_call$ = 1
;

... with ...

select autoid, count_of_people, phone_number
from priority_ordered$ PO
start with your_best_call$ = 1
connect by PO.autoid_list||'|'||prior PO.autoid = prior PO.autoid_list
;

With a little help from the object-relational features of Oracle the phone number collection may be, in a very elegant way, resolved by a collector object (an object which collects data to its member collection attribute via a member method, returning a new instance of its class). A small example of SQL*Plus spools for this solution:

SQL> set verify off

SQL> define maxcountofpeoplesum = 130
SQL> @@23023283-split-records-into-buckets-based-on-a-sum-of-counts.sql

COUNT_OF_PEOPLE_SUM     AUTOID PHONE_NUMBER    COUNT_OF_PEOPLE
------------------- ---------- --------------- ---------------
                130          1 5565787                      12
                130          2 2342343                       3
                130          3 2654456                       1
                130          5 9856756                       4
                130          6 9756456                       4
                130          7 4346365                      57
                130         10 2524349                      30
                130         11 2029393                       2
                130         12 9285656                      17

9 rows selected.

SQL> define maxcountofpeoplesum = 15
SQL> @@23023283-split-records-into-buckets-based-on-a-sum-of-counts.sql

COUNT_OF_PEOPLE_SUM     AUTOID PHONE_NUMBER    COUNT_OF_PEOPLE
------------------- ---------- --------------- ---------------
                 15          1 5565787                      12
                 15          2 2342343                       3

SQL> define maxcountofpeoplesum = 200
SQL> @@23023283-split-records-into-buckets-based-on-a-sum-of-counts.sql

COUNT_OF_PEOPLE_SUM     AUTOID PHONE_NUMBER    COUNT_OF_PEOPLE
------------------- ---------- --------------- ---------------
                148          1 5565787                      12
                148          2 2342343                       3
                148          3 2654456                       1
                148          4 3868556                      12
                148          5 9856756                       4
                148          6 9756456                       4
                148          7 4346365                      57
                148          8 2376743                       3
                148          9 9756343                       3
                148         10 2524349                      30
                148         11 2029393                       2
                148         12 9285656                      17

12 rows selected.

SQL> define maxcountofpeoplesum = 147
SQL> @@23023283-split-records-into-buckets-based-on-a-sum-of-counts.sql

COUNT_OF_PEOPLE_SUM     AUTOID PHONE_NUMBER    COUNT_OF_PEOPLE
------------------- ---------- --------------- ---------------
                147          1 5565787                      12
                147          2 2342343                       3
                147          4 3868556                      12
                147          5 9856756                       4
                147          6 9756456                       4
                147          7 4346365                      57
                147          8 2376743                       3
                147          9 9756343                       3
                147         10 2524349                      30
                147         11 2029393                       2
                147         12 9285656                      17

11 rows selected.

I'm pretty sure the query could be enhanced to query all buckets, as Dmitry's solution does, but that would result in an even huger and possibly badly performing query. Dmitry's solution looks much simpler and more straightforward for your problem.

Enjoy.

Upvotes: 3

mikron

Reputation: 683

You can also try to use User-Defined Aggregate function. Will try to show You in a little example. First of all, we need to create table types:

create or replace type TTN as table of number;
/

Then we are creating the routines that need to be implemented to define a user-defined aggregate function.

create or replace type TO_BALANCED_BUCKET as object
(
   summ TTN,
   result int,

   static function ODCIAggregateInitialize(sctx in out nocopy TO_BALANCED_BUCKET) return number,

   member function ODCIAggregateIterate(self in out nocopy TO_BALANCED_BUCKET, value in number)
      return number,

   member function ODCIAggregateTerminate(self in TO_BALANCED_BUCKET,
                                          returnValue out number,
                                          flags in number) return number,

   member function ODCIAggregateMerge(self in out nocopy TO_BALANCED_BUCKET, ctx2 in TO_BALANCED_BUCKET)
      return number
)
/
create or replace type body TO_BALANCED_BUCKET is

   static function ODCIAggregateInitialize(sctx in out nocopy TO_BALANCED_BUCKET) return number is
   begin
      sctx := TO_BALANCED_BUCKET(TTN(0), 1);
      return ODCIConst.Success;
   end;

   member function ODCIAggregateIterate(self in out nocopy TO_BALANCED_BUCKET, value in number)
      return number is      
      b_FoundGroup boolean := false;
   begin
      if value > 130 then
         result := 0;
      else         
         for li in 1..summ.count loop
             if summ(li) + value <= 130 then
                b_FoundGroup := true;
                summ(li) := summ(li) + value;
                result := li;   
                exit;
             end if;
         end loop;         
         if not b_FoundGroup then
            summ.extend;
            summ(summ.count) := value;            
            result := summ.count;
         end if;         
      end if;  
      return ODCIConst.Success;
   end;

   member function ODCIAggregateTerminate(self in TO_BALANCED_BUCKET,
                                          returnValue out number,
                                          flags in number) return number is
   begin
      returnValue := self.result;      
      return ODCIConst.Success;
   end;

   member function ODCIAggregateMerge(self in out nocopy TO_BALANCED_BUCKET, ctx2 in TO_BALANCED_BUCKET)
      return number is
   begin
      return ODCIConst.Error;
   end;

end;
/

Then we are creating the aggregate function itself.

create or replace function balanced_bucket(input number) return number
   parallel_enable
   aggregate using TO_BALANCED_BUCKET; 
/

And finally the query itself

with test_data as (
    select 1 as AutoID, '5565787' as Phone_Number, 12 as Count_Of_People from dual union all
    select 2, '2342343', 3 from dual union all
    select 3, '2654456', 1 from dual union all
    select 4, '3868556', 12 from dual union all
    select 5, '9856756', 4 from dual union all
    select 6, '9756456', 4 from dual union all
    select 7, '4346365', 57 from dual union all
    select 8, '2376743', 3 from dual union all
    select 9, '9756343', 3 from dual union all
    select 10, '2524349', 30 from dual union all
    select 11, '2029393', 2 from dual union all
    select 12, '9285656', 17 from dual
)
select t.phone_number, t.count_of_people, 
       balanced_bucket(t.count_of_people) over(order by t.count_of_people desc) balanced_bucket
  from test_data t

Hope this solution will help. The algorithm of distribution of clients is Dmity's.

Upvotes: 3

Dmitriy

Reputation: 5565

I'm not sure that problem could be solved with pure SQL. But you can use table functions. Here is a little example for your problem. First of all, we need to create table type:

create type t_bucket_row as object(
    phone_number varchar2(10),
    count_of_people number,
    bucket_no number);
/
create type t_bucket_table as table of t_bucket_row; 
/

And table with test data:

create table test_data as 
with t as (
  select 1 AutoID, '5565787' Phone_Number, 57 Count_Of_People from dual union all
  select 2,   '2342343', 30 from dual union all
  select 3,   '2654456', 17 from dual union all
  select 4,   '3868556', 12 from dual union all
  select 5,   '9856756', 12 from dual union all
  select 6,   '9756456', 4 from dual union all
  select 7,   '4346365', 4 from dual union all
  select 8,   '2376743', 3 from dual union all
  select 9,   '9756343', 3 from dual union all
  select 10,  '2524349', 3 from dual union all
  select 11,  '2029393', 2 from dual union all
  select 12,  '9285656', 1 from dual)
select * from t;

Then we create a function that implements an algorithm of distribution of clients (sorry, there is no comments in the code how it works, but it works; I can write it later if you need). Here we create a variable of table type, fill it with phones and bucket numbers, then return it from a function. After that, in SQL query, we use function's result as table in FROM clause. Parameter p_sum is your desired sum of counts of clients:

create or replace function get_buckets(p_sum number) return t_bucket_table is
  buckets t_bucket_table := t_bucket_table();
  type bucket_sums is table of number index by binary_integer;
  sums bucket_sums;
  counter number := 0;
  found boolean;
begin
  sums(1) := 0;

-- next line was edited to fix bug in resuult of distribution:
  for i in (select t.*, rownum from test_data t order by t.count_of_people desc) loop
    buckets.extend;
    counter := counter + 1;
    buckets(counter) := t_bucket_row(i.phone_number, i.count_of_people, 0);

    if i.count_of_people > p_sum then
       continue;
    end if;

    found := false;
    for j in 1..sums.count loop
      if sums(j) + i.count_of_people <= p_sum then
         sums(j) := sums(j) + i.count_of_people;
         buckets(counter).bucket_no := j;
         found := true;
         exit;
      end if;
    end loop;
    if not found then
       sums(sums.count + 1) := i.count_of_people;
       buckets(counter).bucket_no := sums.count;
    end if;

  end loop; 

  return buckets;
end;
/

Now we can execute this function. Result is:

SQL> select * from table(get_buckets(130));

PHONE_NUMB COUNT_OF_PEOPLE  BUCKET_NO
---------- --------------- ----------
5565787                 57          1
2342343                 30          1
2654456                 17          1
3868556                 12          1
9856756                 12          1
9756456                  4          2
4346365                  4          2
2376743                  3          2
9756343                  3          2
2524349                  3          2
2029393                  2          1
9285656                  1          2

12 rows selected.

Buckets distribution:

select bucket_no, sum(count_of_people) from table(get_buckets(130)) group by bucket_no;

 BUCKET_NO SUM(COUNT_OF_PEOPLE)
---------- --------------------
        1           130
        2            18

If count_of_people is more than p_sum, it goes to bucket "0":

SQL> select * from table(get_buckets(35));

PHONE_NUMB COUNT_OF_PEOPLE  BUCKET_NO
---------- --------------- ----------
5565787                 57          0
2342343                 30          1
2654456                 17          2
3868556                 12          2
9856756                 12          3
9756456                  4          1
4346365                  4          2
2376743                  3          3
9756343                  3          3
2524349                  3          3
2029393                  2          2
9285656                  1          1

12 rows selected.

SQL> select bucket_no, sum(count_of_people) from table(get_buckets(35)) group by bucket_no;

 BUCKET_NO SUM(COUNT_OF_PEOPLE)
---------- --------------------
         1                   35
         2                   35
         3                   21
         0                   57

Upvotes: 3

split records into buckets based on a sum of counts

Answers (3)

Related Questions