Antti
Antti

Reputation: 1293

Group a dataframe based on sequence breaks in a column?

I have a data.frame, which has a column of integer values. I need to form a grouping variable that identifies sequence breaks in that column. For instance, I could create another column of ascending integers that adds one whenever the original column's value is not greater than its lagged value. How do I do this?

E.g. if I have a data.frame like this:

df <- data.frame(A = c(1,2,4,6,78,3,56,78,23))

I need some way to produce new table with column B:

df$B <- c(1,1,1,1,1,2,2,2,3)

I have tried e.g. with dplyr:

df %>% mutate(B = 1,
              B = case_when(A < lag(A), B + 1))

That is not quite correct.

Upvotes: 5

Views: 706

Answers (2)

akrun
akrun

Reputation: 886938

Or using data.table

library(data.table)
setDT(df)[, B := cumsum(A - shift(A, fill = first(A)) < 0) + 1][]
#    A B
#1:  1 1
#2:  2 1
#3:  4 1
#4:  6 1
#5: 78 1
#6:  3 2
#7: 56 2
#8: 78 2
#9: 23 3

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388807

We can use cumsum and diff which will increment the value every time the sequence is broken

cumsum(c(-1, diff(df$A)) < 0)
#[1] 1 1 1 1 1 2 2 2 3

We can also integrate into dplyr chain to get

library(dplyr)

df %>%
  mutate(B = cumsum(c(-1, diff(A)) < 0))

#   A B
#1  1 1
#2  2 1
#3  4 1
#4  6 1
#5 78 1
#6  3 2
#7 56 2
#8 78 2
#9 23 3

A hacky way using lag could be

df %>%
  mutate(B = cumsum(c(-1, (A - lag(A))[-1]) < 0))

Upvotes: 5

Related Questions