Christian
Christian

Reputation: 243

Count repetitions of a set of characters

How can I count repetitions of a set of characters in a vector? Imagine the following vector consisting of "A" and "B":

x <- c("A", "A", "A", "B", "B", "A", "A", "B", "A")

In this example, the first set would be the sequence of "A" and "B" from index 1 to 5, the second set is the sequence of "A" and "B" from index 6 to 8, and then the third set is the last single "A":

x <- c("A", "A", "A", "B", "B", # set 1
       "A", "A", "B",           # set 2
       "A")                     # set 3

How can set a counter for each set of variables? I need a vector like this:

c(1, 1, 1, 1, 1, 2, 2, 2, 3)  

thanks

Upvotes: 8

Views: 387

Answers (3)

Henrik
Henrik

Reputation: 67778

Alternative 1.

cumsum(c(TRUE, diff(match(x,  c("A", "B"))) == -1))
# [1] 1 1 1 1 1 2 2 2 3

Step by step:

match(x,  c("A", "B"))
# [1] 1 1 1 2 2 1 1 2 1

diff(match(x,  c("A", "B")))
# [1]  0  0  1  0 -1  0  1 -1

diff(match(x,  c("A", "B"))) == -1
# [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE

c(TRUE, diff(match(x,  c("A", "B"))) == -1)
# [1]  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE

Alternative 2.

Using data.table::rleid:

library(data.table)
cumsum(c(TRUE, diff(rleid(x) %% 2) == 1))
# [1] 1 1 1 1 1 2 2 2 3

Step by step:

rleid(x)
# [1] 1 1 1 2 2 3 3 4 5

rleid(x) %% 2
# [1] 1 1 1 0 0 1 1 0 1

diff(rleid(x) %% 2)
# [1]  0  0 -1  0  1  0 -1  1

diff(rleid(x) %% 2) == 1
# [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE

c(TRUE, diff(rleid(x) %% 2) == 1)
# [1]  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE

Upvotes: 4

akrun
akrun

Reputation: 887108

We can use only base R methods

x1 <- split(x, cumsum(c(TRUE, x[-1]!= x[-length(x)])))
x2 <- sapply(x1, `[`, 1)
as.numeric(rep(ave(x2, x2, FUN = seq_along), lengths(x1)))
#[1] 1 1 1 1 1 2 2 2 3

Upvotes: 2

Roland
Roland

Reputation: 132706

Use rle:

x <- c("A", "A", "A", "B", "B", "A", "A", "B", "A")  
tmp <- rle(x)
#Run Length Encoding
#  lengths: int [1:5] 3 2 2 1 1
#  values : chr [1:5] "A" "B" "A" "B" "A"

Now change the values:

tmp$values <- ave(rep(1L, length(tmp$values)), tmp$values, FUN = cumsum) 

and inverse the run length encoding:

y <- inverse.rle(tmp)
#[1] 1 1 1 1 1 2 2 2 3

Upvotes: 11

Related Questions