zx8754
zx8754

Reputation: 56219

Create group names for consecutive values

Looks like an easy task, can't figure out a simpler way. I have an x vector below, and need to create group names for consecutive values. My attempt was using rle, better ideas?

# data
x <- c(1,1,1,2,2,2,3,2,2,1,1)

# make groups
rep(paste0("Group_", 1:length(rle(x)$lengths)), rle(x)$lengths)
# [1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4"
# [9] "Group_4" "Group_5" "Group_5"

Upvotes: 8

Views: 1351

Answers (4)

Sotos
Sotos

Reputation: 51592

Using rleid from data.table,

library(data.table)

rleid(x, prefix = "Group_")
#[1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4" "Group_4" "Group_5" "Group_5"

Upvotes: 11

ludvigolsen
ludvigolsen

Reputation: 181

group() from groupdata2 can create groups from a list of group starting points, using the l_starts method. By setting n to auto, it automatically finds group starts:

x <- c(1,1,1,2,2,2,3,2,2,1,1)
groupdata2::group(x, n = "auto", method = "l_starts")

## # A tibble: 11 x 2
## # Groups:   .groups [5]
##     data .groups
##    <dbl> <fct>  
##  1     1 1      
##  2     1 1      
##  3     1 1      
##  4     2 2      
##  5     2 2      
##  6     2 2      
##  7     3 3      
##  8     2 4      
##  9     2 4      
## 10     1 5      
## 11     1 5     

There's also the differs_from_previous() function which finds values, or indices of values, that differ from the previous value by some threshold(s).

# The values to start groups at
differs_from_previous(x, threshold = 1,
                      direction = "both")
## [1] 2 3 2 1

# The indices to start groups at
differs_from_previous(x, threshold = 1,
                      direction = "both",
                      return_index = TRUE)
## [1] 4 7 8 10

Upvotes: 2

user3603486
user3603486

Reputation:

Using cumsum but not relying on the data being numeric:

paste0("Group_", 1 + c(0, cumsum(x[-length(x)] != x[-1])))


[1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4" "Group_4" "Group_5" "Group_5"

Upvotes: 3

Roland
Roland

Reputation: 132949

Using diff and cumsum :

paste0("Group_", cumsum(c(1, diff(x) != 0)))
#[1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4" "Group_4" "Group_5" "Group_5"

(If your values are floating point values, you might have to avoid != and use a tolerance instead.)

Upvotes: 10

Related Questions