Reputation: 266
I'm using the cut2 from the Hmisc library function in R to cut my dataset into a fixed number of bins, eg
library(Hmisc)
as.numeric(cut2(Catchment_Population_Log, g=4))
But, is there a straightforward way to add a partition level, so I get n cuts per Category? ie, I'm looking to basically use cut2 (or similar) independently for each category (When I do something similar in SQL, I would use PARTITION BY).
So in my head, it would be something like this;
as.numeric(cut2(Catchment_Population_Log, g=4, partition_by=CategoryID))
But can't see anything in the cut2 documentation that would allow this. I've played around using split(), but haven't been able to get anything to work.
Example data, including the output I'm looking to acheive
library(Hmisc)
library(dplyr)
category <- c('Category_1','Category_1','Category_1','Category_1','Category_2','Category_2','Category_2','Category_2','Category_3','Category_3','Category_3','Category_3')
catchment_population_log <- c(0.3,0.2,0.1,0.4,0.4,0.2,0.6,0.9,0.2,0.6,0.2,0.4)
exp_result <- c(2,1,1,2,1,1,2,2,1,2,1,2)
data <- data.frame(category, catchment_population_log)
# Result just using cut2 - data is cut into 2 bins
# based on their catchment_population_log value
data %>%
mutate(just_using_cut2 = as.numeric(cut2(catchment_population_log,g=2)))
# This time, I'll manually transpose the expected result; each Category
# should be split into 2 bins based on the catchment_population value
# independently of each other.
# As a result, a 0.4 value might fall in bin 1 for one category,
# but bin 2 for another category
data %>%
mutate(just_using_cut2 = as.numeric(cut2(catchment_population_log,g=2))) %>%
cbind(exp_result)
Upvotes: 0
Views: 365
Reputation: 266
Thanks to Moody_Mudskipper, I was able to get this to work exactly how I needed.
# This works with cut in base, as well as cut2, but I'm using cut2
library(Hmisc)
data %>%
group_by(category) %>%
mutate(population_bin = as.numeric(cut2(catchment_population_log,g=2)))
Upvotes: 1