Antti
Antti

Reputation: 1293

Creating a categorical variable for sequence breaks in R?

I have a dataframe with two columns for year and age, e.g.:

df <- data.frame(year = 1980:2000, age = c(40:45, 31:40, 32:36))

I need to create a categorical variable that identifies each age sequence. That would look something like this:

df$seq <- as.character(c(rep(1,6), rep(2,10), rep(3,5)))

Any ideas how to do this efficiently? I have managed to create a dummy for sequence breaks

require(dplyr)
df <- df %>% mutate(brk = case_when(age - lag(age) != 1 ~ 1, T ~ 0) 

but I'm struggling with filling in the rest.

Upvotes: 1

Views: 373

Answers (1)

talat
talat

Reputation: 70256

You have almost done it already. You just need to create a cumulative sum (cumsum) of your brk column:

df %>% mutate(brk = cumsum(case_when(age - lag(age) != 1 ~ 1, T ~ 0)))

You can add 1 to the whole vector if you want to start the first sequence from 1 instead of 0.

Upvotes: 2

Related Questions