rastrast
rastrast

Reputation: 327

Create index column based on sequence of values in column in R

I am working with a very large data.table in R and am trying to create an index column that is based on a sequence of values in another column - or better yet the reappearance of a value in a column. Below is an example with example code:

temp = data.table(
  col1 = c("A","A","A","A","A","B","B","B", "B", "B", "B"),
  col2 = c(1,   0,  0,  1,  0,  1,  0,  1,   0,   0,   1)
)

This produces a dataset that looks like this:

col1  col2
A     1         
A     0         
A     0         
A     1         
A     0         
B     1         
B     0         
B     1         
B     0         
B     0 
B     1

What I need is to create an index column (preferably using data.table terminology) that looks like this:

col1  col2  col3
A     1     1       
A     0     1       
A     0     1       
A     1     2       
A     0     2       
B     1     3       
B     0     3       
B     1     4       
B     0     4       
B     0     4   
B     1     5

I'm new to using data.tables and haven't been able to find anything on slack or other various help sites that give clues on how to create an index column based on reappearing values in another column. Any help is appreciated!

Upvotes: 1

Views: 252

Answers (1)

Maurits Evers
Maurits Evers

Reputation: 50738

Unless I misunderstood, this seems to be a simple matter of (base R's) cumsum?

temp[, col3 := cumsum(col2)]
#    col1 col2 col3
# 1:    A    1    1
# 2:    A    0    1
# 3:    A    0    1
# 4:    A    1    2
# 5:    A    0    2
# 6:    B    1    3
# 7:    B    0    3
# 8:    B    1    4
# 9:    B    0    4
#10:    B    0    4
#11:    B    1    5

Upvotes: 2

Related Questions