B C
B C

Reputation: 318

Finding sequences in a data frame in R

This is a hypothetical data frame:

a <- c(1:10)                   
b <- sample(seq(from = 0, to = 1, by =1), size = 10, replace = TRUE)                   
data <- data.frame(a,b) 

The output will look something like this:

    a b
1   1 1
2   2 1
3   3 0
4   4 0
5   5 1
6   6 1
7   7 1
8   8 1
9   9 1
10 10 0

I would like to create a new column (c) that calculates how many times the b value was repeated prior to the current a value of a:

    a b c
1   1 1 0
2   2 1 1
3   3 0 0
4   4 0 1
5   5 1 0
6   6 1 1
7   7 1 2
8   8 1 3
9   9 1 4
10 10 0 0

I think it may require an if command that that says something like: if b == shift(b) than c = +1, else: 0, but I am fairly new to r, so I am not quite sure how to implement such a procedure. Any help would be greatly appreciated.

Edit: working towards a solution:

data <- data.table(data)
data[, c := b + shift(b)]  

This code will create a column(c) that will add the previous value with the current value.

Upvotes: 2

Views: 1017

Answers (1)

SimonG
SimonG

Reputation: 4881

If you're not fixated on using data.table, you can have a look at rle.

set.seed(123)

a <- c(1:10)
b <- sample(seq(from = 0, to = 1, by =1), size = 10, replace = TRUE)
data <- data.frame(a,b)

len <- rle(data$b)$lengths
data$c <- unlist(sapply(len, function(x) seq.int(1,x)))-1

This gives:

# > data
#     a b c
# 1   1 0 0
# 2   2 1 0
# 3   3 0 0
# 4   4 1 0
# 5   5 1 1
# 6   6 0 0
# 7   7 1 0
# 8   8 1 1
# 9   9 1 2
# 10 10 0 0

Upvotes: 2

Related Questions