Reputation: 497
I have a table with two columns BRANCH
and ACTIVITIES
, where BRANCH
is a unique id of location and ACTIVITIES
are number of records belong to respective BRANCH
. These records to be distributed in 5 buckets in a way that all buckets should contain almost equal number of records. (no matter if difference is +/-1000)
The challenge is if one branch is selected in a bucket then all activities of same branch will also be selected in same bucket, in other words, number of activities belong to one BRANCH
cannot be split. Lets take a very simple example so that I can explain what I am trying to achieve
Total Branches=10
Total Number of activities (records) = 55,000
Average (total activities/total buckets) = 11,000
Sample Data
After Distribution
All buckets contain 11,000 records but things are not such straight forward when we look into real data real data
All Oracle query masters are requested to please look into this. Your expert opinion will highly be appreciated.
Upvotes: 1
Views: 503
Reputation: 1555
You could also try below query. I added some stats columns in this inline view stats_cols_added_tab before I applied dense_rank analytic function to that inline view. Finally I used NTILE analytic function to get five groups.
with sample_data (branch, activities) as (
select 1, 1000 from dual union all
select 2, 2000 from dual union all
select 3, 3000 from dual union all
select 4, 4000 from dual union all
select 5, 5000 from dual union all
select 6, 6000 from dual union all
select 7, 7000 from dual union all
select 8, 8000 from dual union all
select 9, 9000 from dual union all
select 10, 10000 from dual
)
,
stats_cols_added_tab as (
select s.*
, count(*)over() total_branches
, sum(activities)over() total_number_of_activities
, avg(activities)over() * 2 Average
, case when row_number()over(order by s.branch) <= count(*)over() / 2 then 1 else 2 end grp
from sample_data s
)
SELECT BRANCH, ACTIVITIES, NTILE(5) OVER (ORDER BY ranked_grp, BRANCH) AS bucket
FROM (
select BRANCH, ACTIVITIES
, dense_rank()over(
PARTITION BY grp
order by decode(grp, 1, activities, -1 * activities)
) ranked_grp
from stats_cols_added_tab t
) t
order by ranked_grp, BRANCH
;
Upvotes: 0
Reputation: 1270431
Unfortunately, this is a bin-packing problem and a "perfect" solution requires -- essentially -- searching through all possible assignments of buckets and then choosing the "best" one. And such an approach is not really suitable for SQL.
For a "good-enough" solution, though, something like a round-robin approach often works well. Simply enumerate the branches from biggest to smallest and assign them to buckets:
select a.branch,
1 + mod(seqnum, 5) as bucket
from (select a.branch, count(*) as cnt,
row_number() over (order by count(*)) desc as seqnum
from activities a
group by a.branch
) a;
Because of the ordering, this is going to generally create buckets of different sizes. So, a slight variation assigns the buckets as 1-2-3-4-5-5-4-3-2-1:
select a.branch,
(case when mod(seqnum, 10) in (0, 9) then 1
when mod(seqnum, 10) in (1, 8) then 2
when mod(seqnum, 10) in (2, 7) then 3
when mod(seqnum, 10) in (3, 6) then 4
when mod(seqnum, 10) in (4, 5) then 5
end) as bucket
from (select a.branch, count(*) as cnt,
row_number() over (order by count(*)) desc as seqnum
from activities a
group by a.branch
) a;
Upvotes: 1