Corvus
Corvus

Reputation: 8059

Add a countdown column to data.table containing rows until a special row encountered

I have a data.table with ordered data labled up, and I want to add a column that tells me how many records until I get to a "special" record that resets the countdown.

For example:

DT = data.table(idx = c(1,3,3,4,6,7,7,8,9), 
                name = c("a", "a", "a", "b", "a", "a", "b", "a", "b"))
setkey(DT, idx)
#manually add the answer
DT[, countdown := c(3,2,1,0,2,1,0,1,0)]

Gives

> DT
   idx name countdown
1:   1    a         3
2:   3    a         2
3:   3    a         1
4:   4    b         0
5:   6    a         2
6:   7    a         1
7:   7    b         0
8:   8    a         1
9:   9    b         0

See how the countdown column tells me how many rows until a row called "b". The question is how to create that column in code.

Note that the key is not evenly spaced and may contain duplicates (so is not very useful in solving the problem). In general the non-b names could be different, but I could add a dummy column that is just True/False if the solution requires this.

Upvotes: 6

Views: 515

Answers (3)

Arun
Arun

Reputation: 118849

Here's a mix of Josh's and Ananda's solution, in that, I use RLE to generate the way Josh has given the answer:

t <- rle(DT$name)
t <- t$lengths[t$values == "a"]
DT[, cd := rep(t, t+1)]
DT[, cd:=max(.I) - .I, by=cd]

Even better: Taking use of the fact that there's only one b always (or assuming here), you could do this one better:

t <- rle(DT$name)
t <- t$lengths[t$values == "a"]
DT[, cd := rev(sequence(rev(t+1)))-1]

Edit: From OP's comment, it seems clear that there is more than 1 b possible and in such cases, all b should be 0. The first step in doing this is to create groups where b ends after each consecutive a's.

DT <- data.table(idx=sample(10), name=c("a","a","a","b","b","a","a","b","a","b"))
t <- rle(DT$name)
val <- cumsum(t$lengths)[t$values == "b"]
DT[, grp := rep(seq(val), c(val[1], diff(val)))]
DT[, val := c(rev(seq_len(sum(name == "a"))), 
         rep(0, sum(name == "b"))), by = grp]

#     idx name grp val
#  1:   1    a   1   3
#  2:   7    a   1   2
#  3:   9    a   1   1
#  4:   4    b   1   0
#  5:   2    b   1   0
#  6:   8    a   2   2
#  7:   6    a   2   1
#  8:   3    b   2   0
#  9:  10    a   3   1
# 10:   5    b   3   0

Upvotes: 3

Josh O&#39;Brien
Josh O&#39;Brien

Reputation: 162401

Here's another idea:

## Create groups that end at each occurrence of "b"
DT[, cd:=0L]
DT[name=="b", cd:=1L]
DT[, cd:=rev(cumsum(rev(cd)))]
## Count down within them
DT[, cd:=max(.I) - .I, by=cd]
#    idx name cd
# 1:   1    a  3
# 2:   3    a  2
# 3:   3    a  1
# 4:   4    b  0
# 5:   6    a  2
# 6:   7    a  1
# 7:   7    b  0
# 8:   8    a  1
# 9:   9    b  0

Upvotes: 7

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193647

I'm sure (or at least hopeful) that a purely "data.table" solution would be generated, but in the meantime, you could make use of rle. In this case, you're interested in reversing the countdown, so we'll use rev to reverse the "name" values before proceeding.

output <- sequence(rle(rev(DT$name))$lengths)
makezero <- cumsum(rle(rev(DT$name))$lengths)[c(TRUE, FALSE)]
output[makezero] <- 0

DT[, countdown := rev(output)]
DT
#    idx name countdown
# 1:   1    a         3
# 2:   3    a         2
# 3:   3    a         1
# 4:   4    b         0
# 5:   6    a         2
# 6:   7    a         1
# 7:   7    b         0
# 8:   8    a         1
# 9:   9    b         0

Upvotes: 6

Related Questions