Reputation: 326
So I am trying to create a Probability Density Function from data in an Oracle SQL table through a SQL query. So consider the below table:
Name | Spend
--------------
Anne | 110
Phil | 40
Sue | 99
Jeff | 190
Stan | 80
Joe | 90
Ben | 100
Lee | 85
Now if I want to create a PDF from that data I need to count the number of times each customer spends with in a certain quanta (between 0 and 50 or between 50 and 100). An example graph would look something like this (forgive my poor ascii art):
5|
4| *
3| *
2| * *
1|* * * *
|_ _ _ _
5 1 1 2
0 0 5 0
0 0 0
So the axis are:
I am currently using the Oracle SQL CASE function to determine whether the spend falls within the bucket and then summing the number of customers that do. However this is taking forever as it there are a couple of million records.
Any idea on how to do this effectively?
Thanks!
Upvotes: 4
Views: 409
Reputation: 10525
You can try using WIDTH_BUCKET function.
select bucket , count(name)
from (select name, spend,
WIDTH_BUCKET(spend, 0, 200, 4) bucket
from mytable
)
group by bucket
order by bucket;
Here I have divided the range 0 to 200 into 4 bucket. And the function assigns a bucket number to each value. You can group by this bucket and count how many reocrds fall in each bucket.
Demo here.
You can even display the actual bucket range.
select bucket,
cast(min_value + ((bucket-1) * (max_value-min_value)/buckets) as varchar2(10))
||'-'
||cast(min_value + ((bucket) * (max_value-min_value)/buckets) as varchar2(10)),
count(name) c
from (select name,
spend,
WIDTH_BUCKET(spend, min_value, max_value, buckets) bucket
from mytable)
group by bucket
order by bucket;
Sample here.
Upvotes: 4
Reputation: 2602
SELECT COUNT(*) y_axis,
X_AXIS
FROM
(SELECT COUNT(*)y_axis,
CASE
WHEN spend <= 50 THEN 50
WHEN spend < 100 AND spend > 50 THEN 100
WHEN spend < 150 AND spend >= 100 THEN 150
WHEN spend < 200 AND spend >= 150 THEN 200
END x_axis
FROM your_table
GROUP BY spend
)
GROUP BY X_AXIS;
y_axis x_axis
-----------------
4 100
1 50
1 200
2 150
Upvotes: 0