jakej
jakej

Reputation: 61

Mutate a new column based on lagged values within that column - dplyr approach

A base approach and dplyr were detailed here How to create a column which use its own lag value using dplyr

I want the first row to equal k, and then every row subsequent to be the lag of "c" plus "a" minus "b".

The base approach works.

But the dplyr approach does not produce the same result as the base approach. See:

library(tidyverse)
k <- 10 # Set a k value
df1 <- tribble(
  ~a, ~b,
  1,  1,
  1,  2,
  1,  3,
  1,  4,
  1,  5,)
# Base approach
df1$c <- df1$a - df1$b
df1[1, "c"] <- k
df1$c <- cumsum(df1$c)
df1
#> # A tibble: 5 x 3
#>       a     b     c
#>   <dbl> <dbl> <dbl>
#> 1     1     1    10
#> 2     1     2     9
#> 3     1     3     7
#> 4     1     4     4
#> 5     1     5     0
# New df
df2 <- tribble(
  ~a, ~b,
  1,  1,
  1,  2,
  1,  3,
  1,  4,
  1,  5,)
# dplyr approach
df2 %>% 
  mutate(c = lag(cumsum(a - b), 
                 default = k))
#> # A tibble: 5 x 3
#>       a     b     c
#>   <dbl> <dbl> <dbl>
#> 1     1     1    10
#> 2     1     2     0
#> 3     1     3    -1
#> 4     1     4    -3
#> 5     1     5    -6
# Gives two different dataframes

Created on 2020-03-05 by the reprex package (v0.3.0)

Alternative code and desired output:

library(tidyverse)
# Desired output
tribble(
  ~a, ~b, ~c,
  1, 1, 10,
  1, 2, 9,
  1, 3, 7,
  1, 4, 4,
  1, 5, 0)
#> # A tibble: 5 x 3
#>       a     b     c
#>   <dbl> <dbl> <dbl>
#> 1     1     1    10
#> 2     1     2     9
#> 3     1     3     7
#> 4     1     4     4
#> 5     1     5     0
df2 <- tribble(
  ~a, ~b,
  1,  1,
  1,  2,
  1,  3,
  1,  4,
  1,  5,)
k <- 10
df2 %>% 
  mutate(c = case_when(
    row_number() == 1 ~ k,
    row_number() != 1 ~ lag(c) + a - b))
#> Error in x[seq_len(xlen - n)]: object of type 'builtin' is not subsettable

Created on 2020-03-05 by the reprex package (v0.3.0)

Is there another tidyverse approach that provides the output of the base approach?

Upvotes: 2

Views: 1063

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388817

We can do :

library(dplyr)
df2 %>%  mutate(c = k + cumsum(a-b))

# A tibble: 5 x 3
#      a     b     c
#  <dbl> <dbl> <dbl>
#1     1     1    10
#2     1     2     9
#3     1     3     7
#4     1     4     4
#5     1     5     0

when the first value of a - b is not equal to 0, we can use :

df2 %>%  mutate(c = c(k, k + cumsum(a-b)[-1]))

Upvotes: 1

Related Questions