kam
kam

Reputation: 345

Cumulative count for a column using R

I got data like this

structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2), drug_1 = c(0, 
0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1), drug_2 = c(0, 1, 1, 1, 1, 0, 
1, 0, 0, 1, 0, 1)), class = "data.frame", row.names = c(NA, -12L
))

I would like to get the cumulative count of each column for each id and get the data like this

structure(list(id2 = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2), drug_1_b = c(0, 
0, 0, 0, 0, 1, 2, 0, 0, 1, 0, 2), drug_2_b = c(0, 1, 2, 3, 4, 
0, 5, 0, 0, 1, 0, 2)), class = "data.frame", row.names = c(NA, 
-12L))

Upvotes: 0

Views: 431

Answers (5)

Ronak Shah
Ronak Shah

Reputation: 388807

If you have binary values (1/0) in drug columns, you can multiply the cumulative sum with itself to get 0 for 0 values.

library(dplyr)

df %>%
  group_by(id) %>%
  mutate(across(starts_with('drug'), ~cumsum(.) * .)) %>%
  ungroup

#      id drug_1 drug_2
#   <dbl>  <dbl>  <dbl>
# 1     1      0      0
# 2     1      0      1
# 3     1      0      2
# 4     1      0      3
# 5     1      0      4
# 6     1      1      0
# 7     1      2      5
# 8     2      0      0
# 9     2      0      0
#10     2      1      1
#11     2      0      0
#12     2      2      2

Upvotes: 0

hello_friend
hello_friend

Reputation: 5788

Base R solution:

# Resolve the names of vectors we want to cumulatively sum:
# drug_vec_names => character vector
drug_vec_names <- grep( "^drug\\_", colnames(df), value = TRUE)

# Resolve the names of vectors we want to keep: 
# not_drug_vec_names => character vector
not_drug_vec_names <- names(df)[!(names(df) %in% drug_vec_names)]

# Calculate the result: res => data.frame
res <- setNames(
  cbind(
    df[,not_drug_vec_names],
    replace(
      ave(
        df[,drug_vec_names], 
        df[,not_drug_vec_names],
        FUN = cumsum
      ),
      df[,drug_vec_names] == 0,
      0
    )
  ),
  c(not_drug_vec_names, drug_vec_names)
)

Upvotes: 0

Colombo
Colombo

Reputation: 609

You can get a cumulative sum with cumsum.

To split data.frame into subsets, you can use split and then lapply cumsum over the list of the data.frames and again over the list of the columns, or you can use the ave function which does exactly that:

data = structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2), drug_1 = c(0, 
0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1), drug_2 = c(0, 1, 1, 1, 1, 0, 
1, 0, 0, 1, 0, 1)), class = "data.frame", row.names = c(NA, -12L
))
data[-1] = ave(data[-1], data$id, FUN=cumsum)

edit: I assumed that the cumulative sum is requested (as per instructions) and that there is a mistake in the example data. If the example data is correct, then the condition is If the count is zero, don't do cumulative sum and leave at zero or ifelse(x == 0, 0, cumsum(x)) (as per @r2evans). However, this construct doesn't work when applied for the data.frame. A more complex helper function is required:

data[-1] = ave(data[-1], data$id, FUN=function(x){
    y = cumsum(x)
    y[x == 0] = 0
    y
    })

We can now compare it with the requested (renamed) data:

result = structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2), drug_1 = c(0, 
0, 0, 0, 0, 1, 2, 0, 0, 1, 0, 2), drug_2 = c(0, 1, 2, 3, 4, 
0, 5, 0, 0, 1, 0, 2)), class = "data.frame", row.names = c(NA, 
-12L))
identical(data, result)

Upvotes: 2

Martin Gal
Martin Gal

Reputation: 16978

Edit Simplified the solution after reading r2evans' approach.

You could use

library(dplyr)

df %>% 
  group_by(id) %>% 
  mutate(across(starts_with("drug"), 
                ~ifelse(.x == 0, 0, cumsum(.x)))) %>%
  ungroup()

This returns

# A tibble: 12 x 3
      id drug_1 drug_2
   <dbl>  <dbl>  <dbl>
 1     1      0      0
 2     1      0      1
 3     1      0      2
 4     1      0      3
 5     1      0      4
 6     1      1      0
 7     1      2      5
 8     2      0      0
 9     2      0      0
10     2      1      1
11     2      0      0
12     2      2      2

Upvotes: 2

r2evans
r2evans

Reputation: 160407

Base R,

ave(df$drug_2, df$id, FUN = function(z) ifelse(z == 0, z, cumsum(z)))
#  [1] 0 1 2 3 4 0 5 0 0 1 0 2

Upvotes: 1

Related Questions