topcat
topcat

Reputation: 586

Cumulative sum of factor variables

I am trying to create a set of cumulative factor variables in R. My df has treatment dummies for 4 moments of time:

id t1 t2 t3 t4 
1   0  0  0  1 
2   1  0  0  0
3   0  0  0  1
4   0  1  0  0
5   1  0  0  0

What I want is a set of cumulative treatment variables (named tc in the following example) by time like this:

id tc1 tc2 tc3 tc4 
1   0  0  0  1 
2   1  1  1  1
3   0  0  0  1
4   0  1  1  1
5   1  1  1  1

I have tried the cumsum function, but I do not know how to handle this function for factor variables. Any idea of how to do this?

Upvotes: 1

Views: 631

Answers (2)

David Arenburg
David Arenburg

Reputation: 92302

One way is to try the matrixStats::rowCummaxs function, but you will need to convert to a matrix first. Though, judging by your data structure, I would recommend working with a matrix instead of a data.frame in the first place

data1[-1] <- matrixStats::rowCummaxs(as.matrix(data1[-1]))
data1
#   id t1 t2 t3 t4
# 1  1  0  0  0  1
# 2  2  1  1  1  1
# 3  3  0  0  0  1
# 4  4  0  1  1  1
# 5  5  1  1  1  1

Or the blantant apply by row approach (which also convert to a matrix)

data1[-1] <- t(apply(data1[-1], 1, cummax))

Or as @joran implied - we could try the long/wide transformation

library(data.table)
dcast(melt(setDT(data1), 
           id = "id"
           )[, value := cummax(value),
             by = id], 
      id ~ variable)

#    id t1 t2 t3 t4
# 1:  1  0  0  0  1
# 2:  2  1  1  1  1
# 3:  3  0  0  0  1
# 4:  4  0  1  1  1
# 5:  5  1  1  1  1

Or

library(dplyr)
library(tidyr)
data1 %>%
  gather(variable, value, -id) %>%
  group_by(id) %>%
  mutate(value = cummax(value)) %>%
  spread(variable, value)

# Source: local data frame [5 x 5]
# Groups: id [5]
# 
#      id    t1    t2    t3    t4
#   (int) (int) (int) (int) (int)
# 1     1     0     0     0     1
# 2     2     1     1     1     1
# 3     3     0     0     0     1
# 4     4     0     1     1     1
# 5     5     1     1     1     1

Or an interesting alternative by @alexis_laz accumulating pmax per row using Reduce

data1[-1] <- Reduce(pmax, data1[-1], accumulate = TRUE)
data1
#   id t1 t2 t3 t4
# 1  1  0  0  0  1
# 2  2  1  1  1  1
# 3  3  0  0  0  1
# 4  4  0  1  1  1
# 5  5  1  1  1  1

Upvotes: 4

thelatemail
thelatemail

Reputation: 93938

max.col to the rescue:

df[-1][col(df[-1]) >= max.col(df[-1], ties.method="first")] <- 1
df

#  id t1 t2 t3 t4
#1  1  0  0  0  1
#2  2  1  1  1  1
#3  3  0  0  0  1
#4  4  0  1  1  1
#5  5  1  1  1  1

And some more detailed explanation of how this works:

col(df[-1])
#     [,1] [,2] [,3] [,4]
#[1,]    1    2    3    4
#[2,]    1    2    3    4
#[3,]    1    2    3    4
#[4,]    1    2    3    4
#[5,]    1    2    3    4

max.col(df[-1], ties.method="first")
#[1] 4 1 4 2 1

col(df[-1]) >= max.col(df[-1], ties.method="first")
#      [,1]  [,2]  [,3] [,4]
#[1,] FALSE FALSE FALSE TRUE
#[2,]  TRUE  TRUE  TRUE TRUE
#[3,] FALSE FALSE FALSE TRUE
#[4,] FALSE  TRUE  TRUE TRUE
#[5,]  TRUE  TRUE  TRUE TRUE

Upvotes: 3

Related Questions