Benni
Benni

Reputation: 785

Cut range into bins with equal binwidth with minimum number of observations

I want to cut continuous data into bins with equal width. The bin-width should be adapted so that the minimum number of observations in each bin is equal to a specified number. Is there already a function in R that enables this?

Upvotes: 0

Views: 716

Answers (1)

user11538509
user11538509

Reputation:

I do not know about such a function. I would use the while loop that increases the number of bins for each iteration as long as the number of obervations per bin is big enough.

equalBins <- function(values, min_per_bin){
  # before doing anything we can check whether deviding the variable in minimal possible bin number of 2 is okay
  if(length(values)/ 2 < min_per_bin){
    print("Can not cut variable with this min_per_bin")
  } else{
    # firstly we see what range the vector has
    value_range <- max(values) - min(values)

    # starting with one bin
    bin_number <- 1
    # width per bin is calculated with value_range/bin_number
    width_per_bin <- value_range/bin_number
    # we cut the variable from min to max by the width per bin
    cut_variable <- cut(values, seq(min(values), max(values), width_per_bin), include.lowest= TRUE)

    # the following code does the same as the code above, increasing in each iteration bin_number by 1 as long as there is no bin that has a smaller bin number than we asked for
    while(min(table(cut_variable)) > min_per_bin){
      width_per_bin <- value_range/bin_number
      cut_variable <- cut(values, seq(min(values), max(values), width_per_bin), include.lowest= TRUE)
      bin_number <- bin_number + 1
    }    
    return(cut_variable)}}

Examples:

# some vector
vec <- 0:100

# min per bin 25
equalBins(values= vec, min_per_bin= 25)
Levels: [0,25] (25,50] (50,75] (75,100]

# min per bin 33
equalBins(values= vec, min_per_bin= 33)
Levels: [0,33.3] (33.3,66.7] (66.7,100]

# not possible to cut
equalBins(values= vec, min_per_bin= 89)
"Can not cut variable with this min_per_bin"

Upvotes: 1

Related Questions