Xavier Prudent
Xavier Prudent

Reputation: 1712

Update value in dataframe rows with dplyr

I stumbled over what looked to me like a straighforward issue:

I have such a data frame

d <- data.frame(x=c(0,0,0,1,0,2,0),y=c(3,NA,NA,NA,NA,NA,NA))

  x  y
1 0  3
2 0 NA
3 0 NA
4 1 NA
5 0 NA
6 2 NA
7 0 NA

The y column is a delay, and the x column is a waiting time. Given that waiting will decrease the waiting time, I want to have something like

  x  y
1 0  3
2 0  3
3 0  3
4 1  2
5 0  2
6 2  0
7 0  0

Using a loop is the easiest way, but I look for a solution using dplyr. I tried lag() and ifelse, but keep on getting NA.

Upvotes: 0

Views: 963

Answers (1)

alistaire
alistaire

Reputation: 43334

You can subtract the cumsum (cumulative sum) of column x from the initial value of y, so in dplyr,

d <- data.frame(x = c(0,0,0,1,0,2,0),
                y = c(3,NA,NA,NA,NA,NA,NA))

library(dplyr)

d %>% mutate(y = first(y) - cumsum(x))
#>   x y
#> 1 0 3
#> 2 0 3
#> 3 0 3
#> 4 1 2
#> 5 0 2
#> 6 2 0
#> 7 0 0

or in pure base, your favorite variant of

d$y <- d$y[1] - cumsum(d$x)

d
#>   x y
#> 1 0 3
#> 2 0 3
#> 3 0 3
#> 4 1 2
#> 5 0 2
#> 6 2 0
#> 7 0 0

More broadly, you can use Reduce with accumulate = TRUE to build more complicated cumulative functions:

Reduce(`-`, d$x, init = d$y[1], accumulate = TRUE)
#> [1] 3 3 3 3 2 2 0 0

or its tidyverse version purrr::accumulate:

purrr::accumulate(d$x, `-`, .init = d$y[1])
#> [1] 3 3 3 3 2 2 0 0

Upvotes: 7

Related Questions