Reputation: 327
I am working with a very large data.table in R and am trying to create an index column that is based on a sequence of values in another column - or better yet the reappearance of a value in a column. Below is an example with example code:
temp = data.table(
col1 = c("A","A","A","A","A","B","B","B", "B", "B", "B"),
col2 = c(1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1)
)
This produces a dataset that looks like this:
col1 col2
A 1
A 0
A 0
A 1
A 0
B 1
B 0
B 1
B 0
B 0
B 1
What I need is to create an index column (preferably using data.table terminology) that looks like this:
col1 col2 col3
A 1 1
A 0 1
A 0 1
A 1 2
A 0 2
B 1 3
B 0 3
B 1 4
B 0 4
B 0 4
B 1 5
I'm new to using data.tables and haven't been able to find anything on slack or other various help sites that give clues on how to create an index column based on reappearing values in another column. Any help is appreciated!
Upvotes: 1
Views: 252
Reputation: 50738
Unless I misunderstood, this seems to be a simple matter of (base R's) cumsum
?
temp[, col3 := cumsum(col2)]
# col1 col2 col3
# 1: A 1 1
# 2: A 0 1
# 3: A 0 1
# 4: A 1 2
# 5: A 0 2
# 6: B 1 3
# 7: B 0 3
# 8: B 1 4
# 9: B 0 4
#10: B 0 4
#11: B 1 5
Upvotes: 2