Cumulative sum of factor variables

Question

I am trying to create a set of cumulative factor variables in R. My df has treatment dummies for 4 moments of time:

id t1 t2 t3 t4 
1   0  0  0  1 
2   1  0  0  0
3   0  0  0  1
4   0  1  0  0
5   1  0  0  0

What I want is a set of cumulative treatment variables (named tc in the following example) by time like this:

id tc1 tc2 tc3 tc4 
1   0  0  0  1 
2   1  1  1  1
3   0  0  0  1
4   0  1  1  1
5   1  1  1  1

I have tried the cumsum function, but I do not know how to handle this function for factor variables. Any idea of how to do this?

David Arenburg · Accepted Answer

One way is to try the matrixStats::rowCummaxs function, but you will need to convert to a matrix first. Though, judging by your data structure, I would recommend working with a matrix instead of a data.frame in the first place

data1[-1] <- matrixStats::rowCummaxs(as.matrix(data1[-1]))
data1
#   id t1 t2 t3 t4
# 1  1  0  0  0  1
# 2  2  1  1  1  1
# 3  3  0  0  0  1
# 4  4  0  1  1  1
# 5  5  1  1  1  1

Or the blantant apply by row approach (which also convert to a matrix)

data1[-1] <- t(apply(data1[-1], 1, cummax))

Or as @joran implied - we could try the long/wide transformation

library(data.table)
dcast(melt(setDT(data1), 
           id = "id"
           )[, value := cummax(value),
             by = id], 
      id ~ variable)

#    id t1 t2 t3 t4
# 1:  1  0  0  0  1
# 2:  2  1  1  1  1
# 3:  3  0  0  0  1
# 4:  4  0  1  1  1
# 5:  5  1  1  1  1

Or

library(dplyr)
library(tidyr)
data1 %>%
  gather(variable, value, -id) %>%
  group_by(id) %>%
  mutate(value = cummax(value)) %>%
  spread(variable, value)

# Source: local data frame [5 x 5]
# Groups: id [5]
# 
#      id    t1    t2    t3    t4
#   (int) (int) (int) (int) (int)
# 1     1     0     0     0     1
# 2     2     1     1     1     1
# 3     3     0     0     0     1
# 4     4     0     1     1     1
# 5     5     1     1     1     1

Or an interesting alternative by @alexis_laz accumulating pmax per row using Reduce

data1[-1] <- Reduce(pmax, data1[-1], accumulate = TRUE)
data1
#   id t1 t2 t3 t4
# 1  1  0  0  0  1
# 2  2  1  1  1  1
# 3  3  0  0  0  1
# 4  4  0  1  1  1
# 5  5  1  1  1  1

Cumulative sum of factor variables

Answers (2)

Related Questions