Reputation: 56219
Looks like an easy task, can't figure out a simpler way. I have an x
vector below, and need to create group names for consecutive values. My attempt was using rle
, better ideas?
# data
x <- c(1,1,1,2,2,2,3,2,2,1,1)
# make groups
rep(paste0("Group_", 1:length(rle(x)$lengths)), rle(x)$lengths)
# [1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4"
# [9] "Group_4" "Group_5" "Group_5"
Upvotes: 8
Views: 1351
Reputation: 51592
Using rleid
from data.table
,
library(data.table)
rleid(x, prefix = "Group_")
#[1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4" "Group_4" "Group_5" "Group_5"
Upvotes: 11
Reputation: 181
group() from groupdata2 can create groups from a list of group starting points, using the l_starts
method. By setting n
to auto
, it automatically finds group starts:
x <- c(1,1,1,2,2,2,3,2,2,1,1)
groupdata2::group(x, n = "auto", method = "l_starts")
## # A tibble: 11 x 2
## # Groups: .groups [5]
## data .groups
## <dbl> <fct>
## 1 1 1
## 2 1 1
## 3 1 1
## 4 2 2
## 5 2 2
## 6 2 2
## 7 3 3
## 8 2 4
## 9 2 4
## 10 1 5
## 11 1 5
There's also the differs_from_previous()
function which finds values, or indices of values, that differ from the previous value by some threshold(s).
# The values to start groups at
differs_from_previous(x, threshold = 1,
direction = "both")
## [1] 2 3 2 1
# The indices to start groups at
differs_from_previous(x, threshold = 1,
direction = "both",
return_index = TRUE)
## [1] 4 7 8 10
Upvotes: 2
Reputation:
Using cumsum but not relying on the data being numeric:
paste0("Group_", 1 + c(0, cumsum(x[-length(x)] != x[-1])))
[1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4" "Group_4" "Group_5" "Group_5"
Upvotes: 3
Reputation: 132949
Using diff
and cumsum
:
paste0("Group_", cumsum(c(1, diff(x) != 0)))
#[1] "Group_1" "Group_1" "Group_1" "Group_2" "Group_2" "Group_2" "Group_3" "Group_4" "Group_4" "Group_5" "Group_5"
(If your values are floating point values, you might have to avoid !=
and use a tolerance instead.)
Upvotes: 10