Reputation: 532
I already had a look here, where the cut
function is used. However, I haven't been able to come up with a clever solution given my situation.
First some example data that I currently have:
df <- data.frame(
Category = LETTERS[1:20],
Nber_within_category = c(rep(1,8), rep(2,3), rep(6,2), rep(10,3), 30, 50, 77, 90)
)
I would like to make a third column that forms a new category based on the Nber_within_category
column. In this example, how can I make e.g. Category_new
such that in each category, the Nber_within_category
is at least 5 with the constrain that if Category
already has Nber_within_category >= 5
, that the original category is taken.
So for example, it should look like this:
df <- data.frame(
Category = LETTERS[1:20],
Nber_within_category = c(rep(1,8), rep(2,3), rep(6,2), rep(10,3), 30, 50, 77, 90),
Category_new = c(rep('a',5), rep('b', 4), rep('c',2), LETTERS[12:20])
)
Upvotes: 0
Views: 329
Reputation: 2650
It's a bit of a hack, but it works:
df %>%
mutate(tmp = floor((cumsum(Nber_within_category) - 1)/5)) %>%
mutate(new_category = ifelse(Nber_within_category >= 5,
Category,
letters[tmp+1]))
The line floor((cumsum(Nber_within_category) - 1)/5)
is a way of categorising the cumsum with bins of size 5 (-1 to include the rows where the sum is exactly 5), and which I'm using as an index to get new categories for the rows where Nber_within_category < 5
It might be easier to understand how the column tmp
is defined if you run :
x <- 1:100
data.frame(x, y = floor((x- 1)/5))
Upvotes: 1