aldredd
aldredd

Reputation: 266

Cutting data into bins, with partitioning, in R

I'm using the cut2 from the Hmisc library function in R to cut my dataset into a fixed number of bins, eg

library(Hmisc)
as.numeric(cut2(Catchment_Population_Log, g=4))

But, is there a straightforward way to add a partition level, so I get n cuts per Category? ie, I'm looking to basically use cut2 (or similar) independently for each category (When I do something similar in SQL, I would use PARTITION BY).

So in my head, it would be something like this;

as.numeric(cut2(Catchment_Population_Log, g=4, partition_by=CategoryID))

But can't see anything in the cut2 documentation that would allow this. I've played around using split(), but haven't been able to get anything to work.

Example data, including the output I'm looking to acheive

library(Hmisc)
library(dplyr)
category <- c('Category_1','Category_1','Category_1','Category_1','Category_2','Category_2','Category_2','Category_2','Category_3','Category_3','Category_3','Category_3')
catchment_population_log <- c(0.3,0.2,0.1,0.4,0.4,0.2,0.6,0.9,0.2,0.6,0.2,0.4)
exp_result <- c(2,1,1,2,1,1,2,2,1,2,1,2)
data <- data.frame(category, catchment_population_log)

# Result just using cut2 - data is cut into 2 bins
# based on their catchment_population_log value
data %>%
  mutate(just_using_cut2 = as.numeric(cut2(catchment_population_log,g=2)))

# This time, I'll manually transpose the expected result; each Category 
# should be split into 2 bins based on the catchment_population value 
# independently of each other.
# As a result, a 0.4 value might fall in bin 1 for one category,
# but bin 2 for another category

data %>%
  mutate(just_using_cut2 = as.numeric(cut2(catchment_population_log,g=2))) %>%
  cbind(exp_result)

Upvotes: 0

Views: 365

Answers (1)

aldredd
aldredd

Reputation: 266

Thanks to Moody_Mudskipper, I was able to get this to work exactly how I needed.

# This works with cut in base, as well as cut2, but I'm using cut2
library(Hmisc)
data %>%
  group_by(category) %>%
  mutate(population_bin = as.numeric(cut2(catchment_population_log,g=2)))

Upvotes: 1

Related Questions