Scott Hunter
Scott Hunter

Reputation: 19

R - sequence calculations both forward and backward looking

I have the following data frame:

id = c("A","A","A","A","A","A","B","B","B","B","B","B","C","C","C","C","C","C")
month = c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6)
amount = c(0,0,10,0,0,0,0,10,0,10,0,0,0,0,0,10,10,0)

df <- data.frame(id, month, amount)

What I need to do (by ID) is: Calculate (by way of a negative number) the difference in months between zero and non-zero "amount" rows until such time as the "amount" equals 0. When this happens, the time = 0. THEN, once an "amount" exceeds zero in the sequence, the calculation (by way of a positive number) will look back and calculate the difference in months between non-zero and the historical zero "amount" row.

The solution would look like:

solution = c(-2,-1,0,1,2,3,-1,0,1,0,1,2,-3,-2,-1,0,0,1)

As you can probably tell, its pretty tough to search for this multi-layered problem. Ideally the answer would be using data.table as i'm dealing with millions of rows, but dplyr would also suit my needs.

Any help appreciated.

S.

Upvotes: 1

Views: 299

Answers (2)

Frank
Frank

Reputation: 66819

library(data.table)
setDT(DT)

DT[, g := rleid(id, amount != 0)]
DT[, g_id := g - g[1L], by=id]
DT[, v :=  
  if (g_id == 0L) 
    -(.N:1)
  else if (g_id %% 2 == 0)
    1:.N
  else 
    0L
, by=.(id, g_id)]

all.equal(DT$v, solution) # TRUE

To see how it works:

    id month amount  g g_id  v
 1:  A     1      0  1    0 -2
 2:  A     2      0  1    0 -1
 3:  A     3     10  2    1  0
 4:  A     4      0  3    2  1
 5:  A     5      0  3    2  2
 6:  A     6      0  3    2  3
 7:  B     1      0  4    0 -1
 8:  B     2     10  5    1  0
 9:  B     3      0  6    2  1
10:  B     4     10  7    3  0
11:  B     5      0  8    4  1
12:  B     6      0  8    4  2
13:  C     1      0  9    0 -3
14:  C     2      0  9    0 -2
15:  C     3      0  9    0 -1
16:  C     4     10 10    1  0
17:  C     5     10 10    1  0
18:  C     6      0 11    2  1

You can drop the extra columns with DT[, c("g", "g_id") := NULL].

Upvotes: 2

emehex
emehex

Reputation: 10578

With tidyr and dplyr

library(dplyr)
library(tidyr)

df_new <- df %>% 
    group_by(id) %>% 
    # identify non-zero instances
    mutate(temp = ifelse(amount != 0, month, NA)) %>% 
    # fill down first
    fill(temp, .direction = "down") %>% 
    # fill up after
    fill(temp, .direction = "up") %>% 
    # calculate difference
    mutate(solution = month - temp) %>% 
    # remove temp
    select(-temp)

Result

#        id month amount solution
#     <fctr> <dbl>  <dbl>    <dbl>
# 1       A     1      0       -2
# 2       A     2      0       -1
# 3       A     3     10        0
# 4       A     4      0        1
# 5       A     5      0        2
# 6       A     6      0        3
# 7       B     1      0       -1
# 8       B     2     10        0
# 9       B     3      0        1
# 10      B     4     10        0
# 11      B     5      0        1
# 12      B     6      0        2
# 13      C     1      0       -3
# 14      C     2      0       -2
# 15      C     3      0       -1
# 16      C     4     10        0
# 17      C     5     10        0
# 18      C     6      0        1

Upvotes: 1

Related Questions