Reputation: 75
I need to split a vector of repeated groups of elements every time the element value changes. For example:
test_vector <- c("string1", "string1", "string1", "string2",
"string2", "string1", "string1", "string3")
must become:
$`1`
[1] "string1" "string1" "string1"
$`2`
[1] "string2" "string2"
$`3`
[1] "string1" "string1"
$`4`
[1] "string3"
If I try split(test_vector, test_vector)
I get the wrong output:
$string1
[1] "string1" "string1" "string1" "string1" "string1"
$string2
[1] "string2" "string2"
$string3
[1] "string3"
I wrote some code which achieves this but it seems unnecessarily long and I feel like I'm missing something out there that's much simpler:
# find indices where splitting will occur:
split_points <- rep(F, length(test_vector))
for (i in 1:length(test_vector)) {
if (i != 1) {
if (test_vector[i] != test_vector[i-1]) {
split_points[i] <- T
}
}
}
split_points <- c(1, which(split_points))
# create split vector:
split_code <- rep(1, length(test_vector))
for ( j in 1:length(split_points) ) {
if (j!=length(split_points)) {
split_code[
split_points[j]:(split_points[j+1]-1)
] <- j
} else {
split_code[
split_points[j]:length(test_vector)
] <- j
}
}
split_result <- split(test_vector, split_code)
$`1`
[1] "string1" "string1" "string1"
$`2`
[1] "string2" "string2"
$`3`
[1] "string1" "string1"
$`4`
[1] "string3"
If anyone could help me find a simpler solution this would be much appreciated!
Upvotes: 3
Views: 422
Reputation: 102920
A base R option is to use findInterval
+ cumsum
+ rle
, i.e.,
res <- split(test_vector,
findInterval(seq_along(test_vector),
cumsum(rle(test_vector)$lengths),
left.open = TRUE))
such that
> res
$`1`
[1] "string1" "string1" "string1"
$`2`
[1] "string2" "string2"
$`3`
[1] "string1" "string1"
$`4`
[1] "string3"
Upvotes: 0
Reputation: 11
f = cumsum(c(TRUE, test_vector[-length(test_vector)] != test_vector[-1]))
split(test_vector, f)
OR
with(rle(test_vector), Map(rep, values, lengths))
Upvotes: 1
Reputation: 887971
In base R
, we can use rle
to get the run-length-encoding of the vector
grp <- with(rle(test_vector), rep(seq_along(values), lengths))
Use that to split
the vector
split(test_vector, grp)
With data.table
, rleid
gives the id based on the difference between adjacent elements
library(data.table)
split(test_vector, rleid(test_vector))
Upvotes: 1