Reputation: 785
I want to cut continuous data into bins with equal width. The bin-width should be adapted so that the minimum number of observations in each bin is equal to a specified number. Is there already a function in R that enables this?
Upvotes: 0
Views: 716
Reputation:
I do not know about such a function. I would use the while loop that increases the number of bins for each iteration as long as the number of obervations per bin is big enough.
equalBins <- function(values, min_per_bin){
# before doing anything we can check whether deviding the variable in minimal possible bin number of 2 is okay
if(length(values)/ 2 < min_per_bin){
print("Can not cut variable with this min_per_bin")
} else{
# firstly we see what range the vector has
value_range <- max(values) - min(values)
# starting with one bin
bin_number <- 1
# width per bin is calculated with value_range/bin_number
width_per_bin <- value_range/bin_number
# we cut the variable from min to max by the width per bin
cut_variable <- cut(values, seq(min(values), max(values), width_per_bin), include.lowest= TRUE)
# the following code does the same as the code above, increasing in each iteration bin_number by 1 as long as there is no bin that has a smaller bin number than we asked for
while(min(table(cut_variable)) > min_per_bin){
width_per_bin <- value_range/bin_number
cut_variable <- cut(values, seq(min(values), max(values), width_per_bin), include.lowest= TRUE)
bin_number <- bin_number + 1
}
return(cut_variable)}}
Examples:
# some vector
vec <- 0:100
# min per bin 25
equalBins(values= vec, min_per_bin= 25)
Levels: [0,25] (25,50] (50,75] (75,100]
# min per bin 33
equalBins(values= vec, min_per_bin= 33)
Levels: [0,33.3] (33.3,66.7] (66.7,100]
# not possible to cut
equalBins(values= vec, min_per_bin= 89)
"Can not cut variable with this min_per_bin"
Upvotes: 1