Categorize data sequences continously

Question

I am new to R and have a question regarding the adding of a new variable to a table. I have data sequences starting with 10 and ending with 20, which appear several times.

Is there a way to group these sequences continuously?

Example:

The data in the column looks like that

10 3 15 15 19 19 20 20 10 10 11 17 20  ...

I would like to have an output like that

10 group 1
3  group 1
15 group 1
15 group 1
19 group 1
19 group 1
20 group 1
20 group 1
10 group 2
10 group 2
11 group 2
17 group 2
20 group 2
...

Is it possible to program something like that?

Thanks a lot for your help!!

Jaap · Accepted Answer

Using base R you can detect the sequences and create a grouping variable with cumsum and head:

df$grp <- cumsum(df$x == 10 & c(20, head(df$x, -1)) == 20)

gives:

What this does:

df$x == 10 detects the 10's
c(20, head(df$x, -1)) == 20 detects whether the previous value is equal to 20, the first value is set to 20 because there is preceding value for the first value of df$x
By combining these two with & you get a logical value indicating which values in df$ are equel to 10 and for which the preceding value is also equal to 20.
Wrapping that in cumsum you get a grouping value.

Or with data.table:

library(data.table)
setDT(df)[, grp := cumsum(x == 10 & c(0, head(x, -1)) == 20)][]

Or with dplyr:

library(dplyr)
df %>% 
  mutate(grp = cumsum(x == 10 & lag(x, default = 20) == 20))

You can use paste/paste0 to add text to the group-label:

paste0('group_', cumsum(df$x == 10 & c(20, head(df$x, -1)) == 20))

Used data:

df <- data.frame(x = c(10, 3, 15, 15, 19, 19, 20, 20, 10, 10, 11, 17, 20))

Categorize data sequences continously

Answers (2)

Related Questions