Finding sequences in a data frame in R

Question

This is a hypothetical data frame:

a <- c(1:10)                   
b <- sample(seq(from = 0, to = 1, by =1), size = 10, replace = TRUE)                   
data <- data.frame(a,b)

The output will look something like this:

I would like to create a new column (c) that calculates how many times the b value was repeated prior to the current a value of a:

I think it may require an if command that that says something like: if b == shift(b) than c = +1, else: 0, but I am fairly new to r, so I am not quite sure how to implement such a procedure. Any help would be greatly appreciated.

Edit: working towards a solution:

data <- data.table(data)
data[, c := b + shift(b)]

This code will create a column(c) that will add the previous value with the current value.

SimonG · Accepted Answer

If you're not fixated on using data.table, you can have a look at rle.

set.seed(123)

a <- c(1:10)
b <- sample(seq(from = 0, to = 1, by =1), size = 10, replace = TRUE)
data <- data.frame(a,b)

len <- rle(data$b)$lengths
data$c <- unlist(sapply(len, function(x) seq.int(1,x)))-1

This gives:

# > data
#     a b c
# 1   1 0 0
# 2   2 1 0
# 3   3 0 0
# 4   4 1 0
# 5   5 1 1
# 6   6 0 0
# 7   7 1 0
# 8   8 1 1
# 9   9 1 2
# 10 10 0 0

Finding sequences in a data frame in R

Answers (1)

Related Questions