Reputation: 1293
I have a dataframe with two columns for year and age, e.g.:
df <- data.frame(year = 1980:2000, age = c(40:45, 31:40, 32:36))
I need to create a categorical variable that identifies each age sequence. That would look something like this:
df$seq <- as.character(c(rep(1,6), rep(2,10), rep(3,5)))
Any ideas how to do this efficiently? I have managed to create a dummy for sequence breaks
require(dplyr)
df <- df %>% mutate(brk = case_when(age - lag(age) != 1 ~ 1, T ~ 0)
but I'm struggling with filling in the rest.
Upvotes: 1
Views: 373
Reputation: 70256
You have almost done it already. You just need to create a cumulative sum (cumsum
) of your brk
column:
df %>% mutate(brk = cumsum(case_when(age - lag(age) != 1 ~ 1, T ~ 0)))
You can add 1 to the whole vector if you want to start the first sequence from 1 instead of 0.
Upvotes: 2