Reputation: 1712
I stumbled over what looked to me like a straighforward issue:
I have such a data frame
d <- data.frame(x=c(0,0,0,1,0,2,0),y=c(3,NA,NA,NA,NA,NA,NA))
x y
1 0 3
2 0 NA
3 0 NA
4 1 NA
5 0 NA
6 2 NA
7 0 NA
The y column is a delay, and the x column is a waiting time. Given that waiting will decrease the waiting time, I want to have something like
x y
1 0 3
2 0 3
3 0 3
4 1 2
5 0 2
6 2 0
7 0 0
Using a loop is the easiest way, but I look for a solution using dplyr. I tried lag() and ifelse, but keep on getting NA.
Upvotes: 0
Views: 963
Reputation: 43334
You can subtract the cumsum
(cumulative sum) of column x
from the initial value of y
, so in dplyr,
d <- data.frame(x = c(0,0,0,1,0,2,0),
y = c(3,NA,NA,NA,NA,NA,NA))
library(dplyr)
d %>% mutate(y = first(y) - cumsum(x))
#> x y
#> 1 0 3
#> 2 0 3
#> 3 0 3
#> 4 1 2
#> 5 0 2
#> 6 2 0
#> 7 0 0
or in pure base, your favorite variant of
d$y <- d$y[1] - cumsum(d$x)
d
#> x y
#> 1 0 3
#> 2 0 3
#> 3 0 3
#> 4 1 2
#> 5 0 2
#> 6 2 0
#> 7 0 0
More broadly, you can use Reduce
with accumulate = TRUE
to build more complicated cumulative functions:
Reduce(`-`, d$x, init = d$y[1], accumulate = TRUE)
#> [1] 3 3 3 3 2 2 0 0
or its tidyverse version purrr::accumulate
:
purrr::accumulate(d$x, `-`, .init = d$y[1])
#> [1] 3 3 3 3 2 2 0 0
Upvotes: 7