Function has unexpected behaviour (mutate function in dplyr)

Question

I expected cumulative sum in "n_rest"-column. But I get only the copy of "n_i"-column. My problem can be solved inserting "# as.data.frame() %>%" but I don't like this solution and I would like to understand the explanation of my mistake.

Thanks in advance!

library(dplyr)

t      <- c(42,57,63,98,104,105,132,132,132,133,133,133,139,140,161,180,180,195,195,233)
status <- c(1 ,1 ,1 ,1 ,0  ,1  ,1  ,1  ,1  ,1  ,1  ,1  ,1  ,1  ,1  ,1  ,1  ,1  ,1  ,  0)

KMP <- function(time,status){

  n_ges = length(t)

  df <- data.frame(t = t, status = status, n = 1)
  df <- df %>%  group_by(t,status) %>%
                summarise(n_i = sum(n)) %>%
                # as.data.frame() %>%
                mutate(n_rest = rev(cumsum(n_i)))

  df

}

Spacedman · Accepted Answer

The mutate is still working on the groups.

By passing into as.data.frame you are dropping the grouping. Alternatively reset the grouping by putting an empty group_by in the pipe:

> df %>% group_by(t,status) %>% summarise(n_i=sum(n)) %>% group_by() %>% mutate(n_rest=cumsum(n_i))
# A tibble: 14 x 4
       t status   n_i n_rest
        
 1    42      1     1      1
 2    57      1     1      2
 3    63      1     1      3
 4    98      1     1      4
 5   104      0     1      5
 6   105      1     1      6
 7   132      1     3      9
 8   133      1     3     12
 9   139      1     1     13
10   140      1     1     14
11   161      1     1     15
12   180      1     2     17
13   195      1     2     19
14   233      0     1     20

Function has unexpected behaviour (mutate function in dplyr)

Answers (2)

Related Questions