T X
T X

Reputation: 613

How to label consecutive data points (clusters) as different groups

I'd like to label the consecutive data points of same property (in this example "TRUE") as different groups (e.g., Group1, Group2, ...).

Below is the example data:

dt <- data.frame(value = c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE))

The "group" column is what I want to achive:

 value group
  TRUE    G1
  TRUE    G1
 FALSE  <NA>
 FALSE  <NA>
  TRUE    G2
 FALSE  <NA>
  TRUE    G3
  TRUE    G3
  TRUE    G3
  TRUE    G3

Upvotes: 1

Views: 124

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389325

Another option with rle -

dt$group <-  paste0('G', with(rle(dt$value), rep(cumsum(values), lengths)))
dt$group[!dt$value] <- NA
dt

#   value group
#1   TRUE    G1
#2   TRUE    G1
#3  FALSE  <NA>
#4  FALSE  <NA>
#5   TRUE    G2
#6  FALSE  <NA>
#7   TRUE    G3
#8   TRUE    G3
#9   TRUE    G3
#10  TRUE    G3

Upvotes: 1

akrun
akrun

Reputation: 887951

In base R, we can use rle

dt$group <- inverse.rle(within.list(rle(dt$value), {
          values <- NA^!values
     values[!is.na(values)] <- paste0("G", seq_along(values[!is.na(values)]))}))

-output

dt
   value group
1   TRUE    G1
2   TRUE    G1
3  FALSE  <NA>
4  FALSE  <NA>
5   TRUE    G2
6  FALSE  <NA>
7   TRUE    G3
8   TRUE    G3
9   TRUE    G3
10  TRUE    G3

or a bit more compact option

library(dplyr)
library(stringr)
dt %>% 
   mutate(group = str_c('G', cumsum(c(TRUE, diff(value) > 0)) * NA^!value))
   value group
1   TRUE    G1
2   TRUE    G1
3  FALSE  <NA>
4  FALSE  <NA>
5   TRUE    G2
6  FALSE  <NA>
7   TRUE    G3
8   TRUE    G3
9   TRUE    G3
10  TRUE    G3

or with rleid from data.table

library(data.table)
setDT(dt)[, tmp := rleid(value)][(value), 
     group := paste0("G", .GRP), tmp][, tmp := NULL][]
    value group
 1:  TRUE    G1
 2:  TRUE    G1
 3: FALSE  <NA>
 4: FALSE  <NA>
 5:  TRUE    G2
 6: FALSE  <NA>
 7:  TRUE    G3
 8:  TRUE    G3
 9:  TRUE    G3
10:  TRUE    G3

Upvotes: 1

Related Questions