Reputation: 301
I'm trying to create a new column in my dataframe by subtracting the value in the leading row from one column from the value in another column. Then, iteratively subtract from the newly created column. So it's not a simple lagged difference between two rows in the same column, it's a bit more complicated. An example should help:
Current dataframe:
> df
year id wd dia
1 2018 A218t05 0.443 403.1944
2 2017 A218t05 0.422 NA
3 2016 A218t05 0.178 NA
4 2015 A218t05 0.323 NA
5 2014 A218t05 0.132 NA
6 2013 A218t05 0.538 NA
7 2012 A218t05 0.236 NA
8 2011 A218t05 0.244 NA
9 2010 A218t05 0.157 NA
10 2009 A218t05 0.207 NA
11 2008 A218t05 0.306 NA
12 2007 A218t05 0.157 NA
13 2006 A218t05 0.207 NA
14 2005 A218t05 0.145 NA
15 2004 A218t05 0.240 NA
16 2003 A218t05 0.232 NA
17 2002 A218t05 0.344 NA
18 2001 A218t05 0.600 NA
19 2000 A218t05 0.559 NA
Desired output:
> desired
year id wd dia dia_year
1 2018 A218t05 0.443 403.1944 403.1944
2 2017 A218t05 0.422 . 402.7724
3 2016 A218t05 0.178 . 402.5944
4 2015 A218t05 0.323 . 402.2714
5 2014 A218t05 0.132 . 402.1394
6 2013 A218t05 0.538 . 401.6014
7 2012 A218t05 0.236 . 401.3654
8 2011 A218t05 0.244 . 401.1214
9 2010 A218t05 0.157 . 400.9644
10 2009 A218t05 0.207 . 400.7574
11 2008 A218t05 0.306 . 400.4514
12 2007 A218t05 0.157 . 400.2944
13 2006 A218t05 0.207 . 400.0874
14 2005 A218t05 0.145 . 399.9424
15 2004 A218t05 0.240 . 399.7024
16 2003 A218t05 0.232 . 399.4704
17 2002 A218t05 0.344 . 399.1264
18 2001 A218t05 0.600 . 398.5264
19 2000 A218t05 0.559 . 397.9674
The dia_year
column should show the original dia
in the first row, then be a running difference of the dia_year
and wd
value from the previous year. It is vital that the data remained ordered descending by year. I've tried the following code (and numerous variations thereof), but I normally get NAs or errors:
df.test <- df %>%
dplyr::group_by(id) %>%
dplyr::arrange(id, -year) %>%
dplyr::mutate(dia_year = ifelse(year=="2018", dia, dia - lead(wd, default=first(wd))))
df.test
year id wd dia dia_year
<int> <chr> <dbl> <dbl> <dbl>
1 2018 A218t05 0.443 403. 403.
2 2017 A218t05 0.422 NA NA
3 2016 A218t05 0.178 NA NA
4 2015 A218t05 0.323 NA NA
5 2014 A218t05 0.132 NA NA
6 2013 A218t05 0.538 NA NA
7 2012 A218t05 0.236 NA NA
8 2011 A218t05 0.244 NA NA
9 2010 A218t05 0.157 NA NA
10 2009 A218t05 0.207 NA NA
11 2008 A218t05 0.306 NA NA
12 2007 A218t05 0.157 NA NA
13 2006 A218t05 0.207 NA NA
14 2005 A218t05 0.145 NA NA
15 2004 A218t05 0.24 NA NA
16 2003 A218t05 0.232 NA NA
17 2002 A218t05 0.344 NA NA
18 2001 A218t05 0.6 NA NA
19 2000 A218t05 0.559 NA NA
As you can see, the ifelse
statement works for the first row, but the lagged difference won't simply prints NAs. Ideally I could do this in dplyr
, but not required. FWIW, by original dataset contains 293 unique id
s and ~57k rows. Any help much appreciated!
df <- structure(list(year = 2018:2000, id = c("A218t05", "A218t05",
"A218t05", "A218t05", "A218t05", "A218t05", "A218t05", "A218t05",
"A218t05", "A218t05", "A218t05", "A218t05", "A218t05", "A218t05",
"A218t05", "A218t05", "A218t05", "A218t05", "A218t05"), wd = c(0.443,
0.422, 0.178, 0.323, 0.132, 0.538, 0.236, 0.244, 0.157, 0.207,
0.306, 0.157, 0.207, 0.145, 0.24, 0.232, 0.344, 0.6, 0.559),
dia = c(403.1944, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -19L), class = "data.frame")
Upvotes: 1
Views: 154
Reputation: 18541
I think we can use purrr::accumulate()
. The trick is to use the first dia[1]
as .init
and we have to skip the first element of wd[-1]
.
library(dplyr)
library(purrr)
df %>%
mutate(dia_year = purrr::accumulate(wd[-1], `-`, .init = dia[1]))
#> year id wd dia dia_year
#> 1 2018 A218t05 0.443 403.1944 403.1944
#> 2 2017 A218t05 0.422 NA 402.7724
#> 3 2016 A218t05 0.178 NA 402.5944
#> 4 2015 A218t05 0.323 NA 402.2714
#> 5 2014 A218t05 0.132 NA 402.1394
#> 6 2013 A218t05 0.538 NA 401.6014
#> 7 2012 A218t05 0.236 NA 401.3654
#> 8 2011 A218t05 0.244 NA 401.1214
#> 9 2010 A218t05 0.157 NA 400.9644
#> 10 2009 A218t05 0.207 NA 400.7574
#> 11 2008 A218t05 0.306 NA 400.4514
#> 12 2007 A218t05 0.157 NA 400.2944
#> 13 2006 A218t05 0.207 NA 400.0874
#> 14 2005 A218t05 0.145 NA 399.9424
#> 15 2004 A218t05 0.240 NA 399.7024
#> 16 2003 A218t05 0.232 NA 399.4704
#> 17 2002 A218t05 0.344 NA 399.1264
#> 18 2001 A218t05 0.600 NA 398.5264
#> 19 2000 A218t05 0.559 NA 397.9674
Created on 2022-11-04 with reprex v2.0.2
Upvotes: 1