Reputation: 2035
I have the following data:
library(dplyr, warn.conflicts = FALSE)
df <- tibble(
x = c(30, 60, 90, 30, 60, 90),
phase = c(rep(c("phase 1", "phase 2"), each = 3))
)
df
#> # A tibble: 6 x 2
#> x phase
#> <dbl> <chr>
#> 1 30 phase 1
#> 2 60 phase 1
#> 3 90 phase 1
#> 4 30 phase 2
#> 5 60 phase 2
#> 6 90 phase 2
Created on 2020-08-11 by the reprex package (v0.3.0)
Where x
is the elapsed time (in seconds) within each phase
. Since phase
is something that happens continuously, I am interested in calculating the total elapsed time.
Desired output:
#> # A tibble: 6 x 3
#> x phase elapsed_time
#> <dbl> <chr> <dbl>
#> 1 30 phase 1 30
#> 2 60 phase 1 60
#> 3 90 phase 1 90
#> 4 30 phase 2 120
#> 5 60 phase 2 150
#> 6 90 phase 2 180
Any ideas? Please, note that my real example has much more phases.
Upvotes: 0
Views: 155
Reputation: 9933
Here is another dplyr
solution. It finds the start of each phase, and adds this to x
library(tidyverse)
df <- tibble(
x = c(30, 60, 90, 30, 60, 90),
phase = c(rep(c("phase 1", "phase 2"), each = 3))
)
df %>% group_by(phase) %>%
nest() %>%
mutate(start = map_dbl(data, max)) %>%
ungroup() %>%
mutate(start = lag(start, default = 0)) %>%
unnest(data) %>%
mutate(elapsed_time = start + x) %>%
select(-start)
#> # A tibble: 6 x 3
#> phase x elapsed_time
#> <chr> <dbl> <dbl>
#> 1 phase 1 30 30
#> 2 phase 1 60 60
#> 3 phase 1 90 90
#> 4 phase 2 30 120
#> 5 phase 2 60 150
#> 6 phase 2 90 180
Created on 2020-08-11 by the reprex package (v0.3.0)
Upvotes: 1
Reputation: 41
I believe the following post has the answer you're looking for:
How to add a cumulative column to an R dataframe using dplyr?
It shows how to create a cumulative column using group_by
and mutate
. It appears you want the elapsed time to sum across both phases, so simply do not include the group_by
call in your code.
Upvotes: 4
Reputation: 51622
Here is an idea via dplyr
. First we need to group by the phase
and get the time differences for each. We then ungroup()
and take the cumsum()
as a total, i.e.
library(dplyr)
df %>%
group_by(phase) %>%
mutate(diffs = x - lag(x, default = 0)) %>%
ungroup() %>%
mutate(res = cumsum(diffs)) %>%
select(-diffs)
# A tibble: 6 x 3
# x phase res
# <dbl> <chr> <dbl>
#1 30 phase 1 30
#2 60 phase 1 60
#3 90 phase 1 90
#4 30 phase 2 120
#5 60 phase 2 150
#6 90 phase 2 180
Upvotes: 3