Reputation: 243
How can I count repetitions of a set of characters in a vector? Imagine the following vector consisting of "A"
and "B"
:
x <- c("A", "A", "A", "B", "B", "A", "A", "B", "A")
In this example, the first set would be the sequence of "A"
and "B"
from index 1 to 5, the second set is the sequence of "A"
and "B"
from index 6 to 8, and then the third set is the last single "A"
:
x <- c("A", "A", "A", "B", "B", # set 1
"A", "A", "B", # set 2
"A") # set 3
How can set a counter for each set of variables? I need a vector like this:
c(1, 1, 1, 1, 1, 2, 2, 2, 3)
thanks
Upvotes: 8
Views: 387
Reputation: 67778
Alternative 1.
cumsum(c(TRUE, diff(match(x, c("A", "B"))) == -1))
# [1] 1 1 1 1 1 2 2 2 3
Step by step:
match(x, c("A", "B"))
# [1] 1 1 1 2 2 1 1 2 1
diff(match(x, c("A", "B")))
# [1] 0 0 1 0 -1 0 1 -1
diff(match(x, c("A", "B"))) == -1
# [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
c(TRUE, diff(match(x, c("A", "B"))) == -1)
# [1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
Alternative 2.
Using data.table::rleid
:
library(data.table)
cumsum(c(TRUE, diff(rleid(x) %% 2) == 1))
# [1] 1 1 1 1 1 2 2 2 3
Step by step:
rleid(x)
# [1] 1 1 1 2 2 3 3 4 5
rleid(x) %% 2
# [1] 1 1 1 0 0 1 1 0 1
diff(rleid(x) %% 2)
# [1] 0 0 -1 0 1 0 -1 1
diff(rleid(x) %% 2) == 1
# [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
c(TRUE, diff(rleid(x) %% 2) == 1)
# [1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
Upvotes: 4
Reputation: 887108
We can use only base R
methods
x1 <- split(x, cumsum(c(TRUE, x[-1]!= x[-length(x)])))
x2 <- sapply(x1, `[`, 1)
as.numeric(rep(ave(x2, x2, FUN = seq_along), lengths(x1)))
#[1] 1 1 1 1 1 2 2 2 3
Upvotes: 2
Reputation: 132706
Use rle
:
x <- c("A", "A", "A", "B", "B", "A", "A", "B", "A")
tmp <- rle(x)
#Run Length Encoding
# lengths: int [1:5] 3 2 2 1 1
# values : chr [1:5] "A" "B" "A" "B" "A"
Now change the values:
tmp$values <- ave(rep(1L, length(tmp$values)), tmp$values, FUN = cumsum)
and inverse the run length encoding:
y <- inverse.rle(tmp)
#[1] 1 1 1 1 1 2 2 2 3
Upvotes: 11