Reputation: 859
I would like to assign an index/group number to each row indicating what percentile it belongs to. For instance, the code I have now is:
SELECT User, Revenue,
(int)(100*(PERCENT_RANK() OVER(ORDER BY Revenue)))/5 AS Percentile
FROM RevenueDistribution;
So a Percentile
0 indicates 0-5%, 1 is 5-10%, etc. However, using equally spaced intervals means the number of rows is uneven, as the first interval has many more rows than the others because there are many rows with Revenue=0
in the data. For instance, there may be over a million users with Percentile=0
, and then a few hundred thousand users each for Percentile=1,2,...
.
Is there a way to assign an index like this, but with an equal number of rows per interval, as opposed to having equally spaced intervals but with an unequal number of rows?
Upvotes: 1
Views: 876
Reputation: 175924
Based on your description I think that you are searching for NTILE:
NTILE(N) over_clause
Divides a partition into N groups (buckets), assigns each row in the partition its bucket number, and returns the bucket number of the current row within its partition. For example, if N is 4, NTILE() divides rows into quartiles. If N is 100, NTILE() divides rows into percentiles.
SELECT User, Revenue,
NTILE(100) OVER(ORDER BY Revenue) AS bucket_num
FROM RevenueDistribution;
Upvotes: 2