woodland_creature
woodland_creature

Reputation: 301

Subtracting lagged values from different columns R

I'm trying to create a new column in my dataframe by subtracting the value in the leading row from one column from the value in another column. Then, iteratively subtract from the newly created column. So it's not a simple lagged difference between two rows in the same column, it's a bit more complicated. An example should help:

Current dataframe:

> df
   year      id    wd      dia
1  2018 A218t05 0.443 403.1944
2  2017 A218t05 0.422       NA
3  2016 A218t05 0.178       NA
4  2015 A218t05 0.323       NA
5  2014 A218t05 0.132       NA
6  2013 A218t05 0.538       NA
7  2012 A218t05 0.236       NA
8  2011 A218t05 0.244       NA
9  2010 A218t05 0.157       NA
10 2009 A218t05 0.207       NA
11 2008 A218t05 0.306       NA
12 2007 A218t05 0.157       NA
13 2006 A218t05 0.207       NA
14 2005 A218t05 0.145       NA
15 2004 A218t05 0.240       NA
16 2003 A218t05 0.232       NA
17 2002 A218t05 0.344       NA
18 2001 A218t05 0.600       NA
19 2000 A218t05 0.559       NA

Desired output:

> desired
   year      id    wd      dia dia_year
1  2018 A218t05 0.443 403.1944 403.1944
2  2017 A218t05 0.422        . 402.7724
3  2016 A218t05 0.178        . 402.5944
4  2015 A218t05 0.323        . 402.2714
5  2014 A218t05 0.132        . 402.1394
6  2013 A218t05 0.538        . 401.6014
7  2012 A218t05 0.236        . 401.3654
8  2011 A218t05 0.244        . 401.1214
9  2010 A218t05 0.157        . 400.9644
10 2009 A218t05 0.207        . 400.7574
11 2008 A218t05 0.306        . 400.4514
12 2007 A218t05 0.157        . 400.2944
13 2006 A218t05 0.207        . 400.0874
14 2005 A218t05 0.145        . 399.9424
15 2004 A218t05 0.240        . 399.7024
16 2003 A218t05 0.232        . 399.4704
17 2002 A218t05 0.344        . 399.1264
18 2001 A218t05 0.600        . 398.5264
19 2000 A218t05 0.559        . 397.9674

The dia_year column should show the original dia in the first row, then be a running difference of the dia_year and wd value from the previous year. It is vital that the data remained ordered descending by year. I've tried the following code (and numerous variations thereof), but I normally get NAs or errors:

df.test <- df %>%
  dplyr::group_by(id) %>%
  dplyr::arrange(id, -year) %>%
  dplyr::mutate(dia_year = ifelse(year=="2018", dia, dia - lead(wd, default=first(wd))))
df.test
    year id         wd   dia dia_year
   <int> <chr>   <dbl> <dbl>    <dbl>
 1  2018 A218t05 0.443  403.     403.
 2  2017 A218t05 0.422   NA       NA 
 3  2016 A218t05 0.178   NA       NA 
 4  2015 A218t05 0.323   NA       NA 
 5  2014 A218t05 0.132   NA       NA 
 6  2013 A218t05 0.538   NA       NA 
 7  2012 A218t05 0.236   NA       NA 
 8  2011 A218t05 0.244   NA       NA 
 9  2010 A218t05 0.157   NA       NA 
10  2009 A218t05 0.207   NA       NA 
11  2008 A218t05 0.306   NA       NA 
12  2007 A218t05 0.157   NA       NA 
13  2006 A218t05 0.207   NA       NA 
14  2005 A218t05 0.145   NA       NA 
15  2004 A218t05 0.24    NA       NA 
16  2003 A218t05 0.232   NA       NA 
17  2002 A218t05 0.344   NA       NA 
18  2001 A218t05 0.6     NA       NA 
19  2000 A218t05 0.559   NA       NA 

As you can see, the ifelse statement works for the first row, but the lagged difference won't simply prints NAs. Ideally I could do this in dplyr, but not required. FWIW, by original dataset contains 293 unique ids and ~57k rows. Any help much appreciated!

df <- structure(list(year = 2018:2000, id = c("A218t05", "A218t05", 
"A218t05", "A218t05", "A218t05", "A218t05", "A218t05", "A218t05", 
"A218t05", "A218t05", "A218t05", "A218t05", "A218t05", "A218t05", 
"A218t05", "A218t05", "A218t05", "A218t05", "A218t05"), wd = c(0.443, 
0.422, 0.178, 0.323, 0.132, 0.538, 0.236, 0.244, 0.157, 0.207, 
0.306, 0.157, 0.207, 0.145, 0.24, 0.232, 0.344, 0.6, 0.559), 
    dia = c(403.1944, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -19L), class = "data.frame") 

Upvotes: 1

Views: 154

Answers (1)

TimTeaFan
TimTeaFan

Reputation: 18541

I think we can use purrr::accumulate(). The trick is to use the first dia[1] as .init and we have to skip the first element of wd[-1].

library(dplyr)
library(purrr)

df %>% 
  mutate(dia_year = purrr::accumulate(wd[-1], `-`, .init = dia[1]))
#>    year      id    wd      dia dia_year
#> 1  2018 A218t05 0.443 403.1944 403.1944
#> 2  2017 A218t05 0.422       NA 402.7724
#> 3  2016 A218t05 0.178       NA 402.5944
#> 4  2015 A218t05 0.323       NA 402.2714
#> 5  2014 A218t05 0.132       NA 402.1394
#> 6  2013 A218t05 0.538       NA 401.6014
#> 7  2012 A218t05 0.236       NA 401.3654
#> 8  2011 A218t05 0.244       NA 401.1214
#> 9  2010 A218t05 0.157       NA 400.9644
#> 10 2009 A218t05 0.207       NA 400.7574
#> 11 2008 A218t05 0.306       NA 400.4514
#> 12 2007 A218t05 0.157       NA 400.2944
#> 13 2006 A218t05 0.207       NA 400.0874
#> 14 2005 A218t05 0.145       NA 399.9424
#> 15 2004 A218t05 0.240       NA 399.7024
#> 16 2003 A218t05 0.232       NA 399.4704
#> 17 2002 A218t05 0.344       NA 399.1264
#> 18 2001 A218t05 0.600       NA 398.5264
#> 19 2000 A218t05 0.559       NA 397.9674

Created on 2022-11-04 with reprex v2.0.2

Upvotes: 1

Related Questions