Reputation: 802
I have a data.table with observations of temp and other weather info over 15 days for several sites. This dput
is for all observations for two sites.
library(data.table)
structure(list(site = c("100", "100", "100", "100", "100", "100",
"100", "100", "100", "100", "100", "100", "100", "100", "100"
), precursor_date = structure(c(15203, 15202, 15201, 15200, 15199,
15198, 15197, 15196, 15195, 15194, 15193, 15192, 15191, 15190,
15189), class = "Date"), lat = c(46.864, 46.864, 46.864, 46.864,
46.864, 46.864, 46.864, 46.864, 46.864, 46.864, 46.864, 46.864,
46.864, 46.864, 46.864), lon = c(-67.998, -67.998, -67.998, -67.998,
-67.998, -67.998, -67.998, -67.998, -67.998, -67.998, -67.998,
-67.998, -67.998, -67.998, -67.998), origDate = structure(c(15204,
15204, 15204, 15204, 15204, 15204, 15204, 15204, 15204, 15204,
15204, 15204, 15204, 15204, 15204), class = "Date"), last = c(2011,
2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011,
2011, 2011, 2011), begin = c(2011, 2011, 2011, 2011, 2011, 2011,
2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011), precursor_day_labl = structure(1:15, .Label = c("obd_p1",
"obd_p2", "obd_p3", "obd_p4", "obd_p5", "obd_p6", "obd_p7", "obd_p8",
"obd_p9", "obd_p10", "obd_p11", "obd_p12", "obd_p13", "obd_p14",
"obd_p15"), class = "factor"), year = c(2011, 2011, 2011, 2011,
2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011
), yday = c(229, 228, 227, 226, 225, 224, 223, 222, 221, 220,
219, 218, 217, 216, 215), dayl = c(50112, 50457.6015625, 50457.6015625,
50803.19921875, 50803.19921875, 51148.80078125, 51148.80078125,
51494.3984375, 51494.3984375, 51840, 51840, 52185.6015625, 52185.6015625,
52531.19921875, 52531.19921875), prcp = c(0, 17, 5, 4, 6, 6,
13, 8, 0, 16, 14, 6, 0, 0, 7), srad = c(403.200012207031, 176,
249.600006103516, 288, 297.600006103516, 268.799987792969, 179.199996948242,
192, 406.399993896484, 208, 227.199996948242, 307.200012207031,
371.200012207031, 304, 182.399993896484), swe = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), tmax = c(22.5, 20.5, 24.5,
26.5, 25, 22.5, 20.5, 21, 24, 23.5, 25, 28, 24, 23.5, 22), tmin = c(10.5,
14.5, 15, 14.5, 12.5, 12, 14, 14, 11, 15.5, 16, 14, 12.5, 14.5,
15), vp = c(1280, 1640, 1720, 1640, 1440, 1400, 1600, 1600, 1320,
1760, 1800, 1600, 1440, 1640, 1720), satv = c(19.99250234375,
17.77504867875, 22.44414580875, 25.14497571375, 23.09540625,
19.99250234375, 17.77504867875, 18.30827693, 21.80845952, 21.18811306125,
23.09540625, 27.34314816, 21.80845952, 21.18811306125, 19.41676944
), r_hum = c(64.024001497749, 92.2641636397099, 76.6346830330004,
65.2217770527891, 62.3500614976193, 70.026251638163, 90.0138181850829,
87.3921672758967, 60.526971141151, 83.0654431053979, 77.9375768720241,
58.5155736507555, 66.0294230630738, 77.4018901663935, 88.5832221119478
)), class = c("data.table", "data.frame"), row.names = c(NA,
-15L), .internal.selfref = <pointer: 0x000001b632fd1ef0>)
I want to get the mean of the weather data (prcp
, tmax
, tmin
, r_hum
) for each #-day interval of the 15 days moving backward in time from the start day, which I called origDate
in the DT
. The dates that go into each respective fifteen day window for each site is under precursor_date
. There would only be one 2-day mean, one 3-day mean, one 4-day mean, etc. and those would be based on the respective #-day window that immediately preceded the origDate
. For example, if the start day is 2011-08-18, I want the mean for 2 days before 8-18( 08-17 and 08-16), 3 days (08-17, 08-16, 08-15), etc., up to the largest window, 15-day (08-17 through 08-03). I don't need every small interval mean possible in the 15 day window. Just the one that immediately precedes origDate
.
To give an idea of what the subset I want, in dplyr
, I can use
df %>% group_by(site) %>% slice_head(n= x)
# A tibble: 2,556 x 19
# Groups: site [1,278]
site precursor_date lat lon origDate last begin precursor_day_l~ year yday dayl prcp srad swe tmax
<chr> <date> <dbl> <dbl> <date> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 100 2011-08-17 46.9 -68.0 2011-08-18 2011 2011 obd_p1 2011 229 50112 0 403. 0 22.5
2 100 2011-08-16 46.9 -68.0 2011-08-18 2011 2011 obd_p2 2011 228 50458. 17 176 0 20.5
3 101 2011-08-11 44.3 -72.7 2011-08-12 2011 2011 obd_p1 2011 223 50458. 8 272 0 26
4 101 2011-08-10 44.3 -72.7 2011-08-12 2011 2011 obd_p2 2011 222 50803. 25 253. 0 26.5
5 102 2011-08-21 46.5 -68.0 2011-08-22 2011 2011 obd_p1 2011 233 49421. 0 378. 0 27
6 102 2011-08-20 46.5 -68.0 2011-08-22 2011 2011 obd_p2 2011 232 49421. 1 397. 0 28
where x is the number of days I want to subset from each group before getting means. But if I use
df %>% group_by(site) %>% slice_head(n= x) %>% mean(prcp)
I get an error and I don't know why. The error is
Warning message:
In mean.default(., "prcp") :
argument is not numeric or logical: returning NA
While I don't know why that error occurs, I'm more intent on getting the subsets to work within a data.table
. The columns that I want subset means for are prcp, tmax, tmin, and r_hum. I would end up creating 60 new columns, 15 for each weather variable. And a lot of the columns would have NA or something, since the DT has the daily observations in columns. To give an idea of what the output might look like, here is a mock-up. It doesn't have to look this way, as long as I have the means for each weather variable and time window in the DT aligned with the appropriate site.
site precursor_date lat lon origDate ... prcp2dmean prcp3dmean prcp4dmean ... tmax2dmean tmax3dmean ...
100 2011-08-17 46.864 -67.998 2011-08-18 ... 1.2 1.4 1.4 ... 25 24 ...
100 2011-08-16 46.864 -67.998 2011-08-18 ... 1.2 1.4 1.4 ... 25 24 ...
100 2011-08-15 46.864 -67.998 2011-08-18 ... NA 1.4 1.4 ... NA 24 ...
100 2011-08-14 46.864 -67.998 2011-08-18 ... NA NA 1.4 ... NA NA ...
100 2011-08-13 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
100 2011-08-12 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
100 2011-08-11 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
100 2011-08-10 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
100 2011-08-09 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
100 2011-08-08 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
100 2011-08-07 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
100 2011-08-06 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
100 2011-08-05 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
100 2011-08-04 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
100 2011-08-03 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
10 2011-08-17 46.864 -67.998 2011-08-18 ... 1.2 1.4 1.4 ... 25 24 ...
10 2011-08-16 46.864 -67.998 2011-08-18 ... 1.2 1.4 1.4 ... 25 24 ...
10 2011-08-15 46.864 -67.998 2011-08-18 ... NA 1.4 1.4 ... NA 24 ...
10 2011-08-14 46.864 -67.998 2011-08-18 ... NA NA 1.4 ... NA NA ...
10 2011-08-13 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
10 2011-08-12 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
10 2011-08-11 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
10 2011-08-10 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
10 2011-08-09 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
10 2011-08-08 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
10 2011-08-07 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
10 2011-08-06 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
10 2011-08-05 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
10 2011-08-04 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
10 2011-08-03 46.864 -67.998 2011-08-18 ... NA NA NA ... NA NA ...
In my DT I tried
pi_df5[, pi_df5 %>% slice_head(n=2) %>% mean(prcp), by = site]
But that doesn't work.
Upvotes: 5
Views: 111
Reputation: 18541
Here is a approach using {data.table}'s frollmean
. Note that this approach produces a lot of information you might not need. For example prcp2
shows the two day average for every day it can calculate that average without showing NA
s. If I understand you correctly you would only need the two day average calculated for the last day. On the other hand, the 15 day average can only be calculated once, on the last day, while in your example output the 15 day average is shown in every single row.
Note: An earlier version of this post used eval(parse())
which is not necessary, since frollmean
can take data.frames
and vectors as x
and n
arguments as @jangorecki pointed out correctly in the comments.
Apologies for the long reprex
most of it is the original data:
library(data.table)
# original data
dat <-
structure(list(site = c("100", "100", "100", "100", "100", "100", "100", "100",
"100", "100", "100", "100", "100", "100", "100"),
precursor_date = structure(c(15203, 15202, 15201, 15200, 15199,
15198, 15197, 15196, 15195, 15194, 15193, 15192,
15191, 15190, 15189), class = "Date"),
lat = c(46.864, 46.864, 46.864, 46.864, 46.864, 46.864, 46.864,
46.864, 46.864, 46.864, 46.864, 46.864,46.864, 46.864,
46.864),
lon = c(-67.998, -67.998, -67.998, -67.998, -67.998, -67.998,
-67.998, -67.998, -67.998, -67.998, -67.998, -67.998,
-67.998, -67.998, -67.998),
origDate = structure(c(15204, 15204, 15204, 15204, 15204, 15204,
15204, 15204, 15204, 15204, 15204, 15204,
15204, 15204, 15204),class = "Date"),
last = c(2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011,
2011, 2011, 2011, 2011, 2011, 2011),
begin = c(2011, 2011, 2011, 2011, 2011, 2011,2011, 2011, 2011,
2011, 2011, 2011, 2011, 2011, 2011),
precursor_day_labl = structure(1:15,
.Label = c("obd_p1", "obd_p2", "obd_p3", "obd_p4", "obd_p5",
"obd_p6", "obd_p7", "obd_p8", "obd_p9", "obd_p10",
"obd_p11", "obd_p12", "obd_p13", "obd_p14", "obd_p15"),
class = "factor"),
year = c(2011, 2011, 2011, 2011, 2011, 2011,2011, 2011, 2011,
2011, 2011, 2011, 2011, 2011, 2011),
yday = c(229, 228, 227, 226, 225, 224, 223, 222, 221, 220, 219,
218, 217, 216, 215),
dayl = c(50112, 50457.6015625, 50457.6015625, 50803.19921875,
50803.19921875, 51148.80078125, 51148.80078125,
51494.3984375, 51494.3984375, 51840, 51840, 52185.6015625,
52185.6015625, 52531.19921875, 52531.19921875),
prcp = c(0, 17, 5, 4, 6, 6, 13, 8, 0, 16, 14, 6, 0, 0, 7),
srad = c(403.200012207031, 176, 249.600006103516, 288,
297.600006103516, 268.799987792969, 179.199996948242,
192, 406.399993896484, 208, 227.199996948242, 307.200012207031,
371.200012207031, 304, 182.399993896484),
swe = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
tmax = c(22.5, 20.5, 24.5, 26.5, 25, 22.5, 20.5, 21, 24, 23.5,
25, 28, 24, 23.5, 22),
tmin = c(10.5, 14.5, 15, 14.5, 12.5, 12, 14, 14, 11, 15.5, 16,
14, 12.5, 14.5, 15),
vp = c(1280, 1640, 1720, 1640, 1440, 1400, 1600, 1600, 1320, 1760,
1800, 1600, 1440, 1640, 1720),
satv = c(19.99250234375, 17.77504867875, 22.44414580875,
25.14497571375, 23.09540625,19.99250234375,
17.77504867875, 18.30827693, 21.80845952, 21.18811306125,
23.09540625, 27.34314816, 21.80845952, 21.18811306125,
19.41676944),
r_hum = c(64.024001497749, 92.2641636397099, 76.6346830330004,
65.2217770527891, 62.3500614976193, 70.026251638163,
90.0138181850829, 87.3921672758967, 60.526971141151,
83.0654431053979, 77.9375768720241, 58.5155736507555,
66.0294230630738, 77.4018901663935, 88.5832221119478)
), class = c("data.table", "data.frame"), row.names = c(NA, -15L))
# the actual calculation starts here -------------------------------
mean_vars <- c("prcp", "tmax", "tmin", "r_hum")
setDT(dat)[,
c(paste0(rep(mean_vars, each = 14) , 2:15)) :=
frollmean(.SD, 2:15, align = 'left'),
by = site,
.SDcols = mean_vars][]
#> site precursor_date lat lon origDate last begin precursor_day_labl
#> 1: 100 2011-08-17 46.864 -67.998 2011-08-18 2011 2011 obd_p1
#> 2: 100 2011-08-16 46.864 -67.998 2011-08-18 2011 2011 obd_p2
#> 3: 100 2011-08-15 46.864 -67.998 2011-08-18 2011 2011 obd_p3
#> 4: 100 2011-08-14 46.864 -67.998 2011-08-18 2011 2011 obd_p4
#> 5: 100 2011-08-13 46.864 -67.998 2011-08-18 2011 2011 obd_p5
#> 6: 100 2011-08-12 46.864 -67.998 2011-08-18 2011 2011 obd_p6
#> 7: 100 2011-08-11 46.864 -67.998 2011-08-18 2011 2011 obd_p7
#> 8: 100 2011-08-10 46.864 -67.998 2011-08-18 2011 2011 obd_p8
#> 9: 100 2011-08-09 46.864 -67.998 2011-08-18 2011 2011 obd_p9
#> 10: 100 2011-08-08 46.864 -67.998 2011-08-18 2011 2011 obd_p10
#> 11: 100 2011-08-07 46.864 -67.998 2011-08-18 2011 2011 obd_p11
#> 12: 100 2011-08-06 46.864 -67.998 2011-08-18 2011 2011 obd_p12
#> 13: 100 2011-08-05 46.864 -67.998 2011-08-18 2011 2011 obd_p13
#> 14: 100 2011-08-04 46.864 -67.998 2011-08-18 2011 2011 obd_p14
#> 15: 100 2011-08-03 46.864 -67.998 2011-08-18 2011 2011 obd_p15
#> year yday dayl prcp srad swe tmax tmin vp satv r_hum prcp2
#> 1: 2011 229 50112.0 0 403.2 0 22.5 10.5 1280 19.99250 64.02400 8.5
#> 2: 2011 228 50457.6 17 176.0 0 20.5 14.5 1640 17.77505 92.26416 11.0
#> 3: 2011 227 50457.6 5 249.6 0 24.5 15.0 1720 22.44415 76.63468 4.5
#> 4: 2011 226 50803.2 4 288.0 0 26.5 14.5 1640 25.14498 65.22178 5.0
#> 5: 2011 225 50803.2 6 297.6 0 25.0 12.5 1440 23.09541 62.35006 6.0
#> 6: 2011 224 51148.8 6 268.8 0 22.5 12.0 1400 19.99250 70.02625 9.5
#> 7: 2011 223 51148.8 13 179.2 0 20.5 14.0 1600 17.77505 90.01382 10.5
#> 8: 2011 222 51494.4 8 192.0 0 21.0 14.0 1600 18.30828 87.39217 4.0
#> 9: 2011 221 51494.4 0 406.4 0 24.0 11.0 1320 21.80846 60.52697 8.0
#> 10: 2011 220 51840.0 16 208.0 0 23.5 15.5 1760 21.18811 83.06544 15.0
#> 11: 2011 219 51840.0 14 227.2 0 25.0 16.0 1800 23.09541 77.93758 10.0
#> 12: 2011 218 52185.6 6 307.2 0 28.0 14.0 1600 27.34315 58.51557 3.0
#> 13: 2011 217 52185.6 0 371.2 0 24.0 12.5 1440 21.80846 66.02942 0.0
#> 14: 2011 216 52531.2 0 304.0 0 23.5 14.5 1640 21.18811 77.40189 3.5
#> 15: 2011 215 52531.2 7 182.4 0 22.0 15.0 1720 19.41677 88.58322 NA
#> prcp3 prcp4 prcp5 prcp6 prcp7 prcp8 prcp9 prcp10 prcp11
#> 1: 7.333333 6.50 6.4 6.333333 7.285714 7.375 6.555556 7.5 8.090909
#> 2: 8.666667 8.00 7.6 8.500000 8.428571 7.375 8.333333 8.9 8.636364
#> 3: 5.000000 5.25 6.8 7.000000 6.000000 7.250 8.000000 7.8 7.090909
#> 4: 5.333333 7.25 7.4 6.166667 7.571429 8.375 8.111111 7.3 6.636364
#> 5: 8.333333 8.25 6.6 8.166667 9.000000 8.625 7.666667 6.9 6.909091
#> 6: 9.000000 6.75 8.6 9.500000 9.000000 7.875 7.000000 7.0 NA
#> 7: 7.000000 9.25 10.2 9.500000 8.142857 7.125 7.111111 NA NA
#> 8: 8.000000 9.50 8.8 7.333333 6.285714 6.375 NA NA NA
#> 9: 10.000000 9.00 7.2 6.000000 6.142857 NA NA NA NA
#> 10: 12.000000 9.00 7.2 7.166667 NA NA NA NA NA
#> 11: 6.666667 5.00 5.4 NA NA NA NA NA NA
#> 12: 2.000000 3.25 NA NA NA NA NA NA NA
#> 13: 2.333333 NA NA NA NA NA NA NA NA
#> 14: NA NA NA NA NA NA NA NA NA
#> 15: NA NA NA NA NA NA NA NA NA
#> prcp12 prcp13 prcp14 prcp15 tmax2 tmax3 tmax4 tmax5 tmax6
#> 1: 7.916667 7.307692 6.785714 6.8 21.50 22.50000 23.500 23.8 23.58333
#> 2: 7.916667 7.307692 7.285714 NA 22.50 23.83333 24.125 23.8 23.25000
#> 3: 6.500000 6.538462 NA NA 25.50 25.33333 24.625 23.8 23.33333
#> 4: 6.666667 NA NA NA 25.75 24.66667 23.625 23.1 23.25000
#> 5: NA NA NA NA 23.75 22.66667 22.250 22.6 22.75000
#> 6: NA NA NA NA 21.50 21.33333 22.000 22.3 22.75000
#> 7: NA NA NA NA 20.75 21.83333 22.250 22.8 23.66667
#> 8: NA NA NA NA 22.50 22.83333 23.375 24.3 24.25000
#> 9: NA NA NA NA 23.75 24.16667 25.125 24.9 24.66667
#> 10: NA NA NA NA 24.25 25.50000 25.125 24.8 24.33333
#> 11: NA NA NA NA 26.50 25.66667 25.125 24.5 NA
#> 12: NA NA NA NA 26.00 25.16667 24.375 NA NA
#> 13: NA NA NA NA 23.75 23.16667 NA NA NA
#> 14: NA NA NA NA 22.75 NA NA NA NA
#> 15: NA NA NA NA NA NA NA NA NA
#> tmax7 tmax8 tmax9 tmax10 tmax11 tmax12 tmax13 tmax14
#> 1: 23.14286 22.8750 23.00000 23.05 23.22727 23.62500 23.65385 23.64286
#> 2: 22.92857 23.0625 23.11111 23.30 23.72727 23.75000 23.73077 23.60714
#> 3: 23.42857 23.4375 23.61111 24.05 24.04545 24.00000 23.84615 NA
#> 4: 23.28571 23.5000 24.00000 24.00 23.95455 23.79167 NA NA
#> 5: 23.07143 23.6875 23.72222 23.70 23.54545 NA NA NA
#> 6: 23.50000 23.5625 23.55556 23.40 NA NA NA NA
#> 7: 23.71429 23.6875 23.50000 NA NA NA NA NA
#> 8: 24.14286 23.8750 NA NA NA NA NA NA
#> 9: 24.28571 NA NA NA NA NA NA NA
#> 10: NA NA NA NA NA NA NA NA
#> 11: NA NA NA NA NA NA NA NA
#> 12: NA NA NA NA NA NA NA NA
#> 13: NA NA NA NA NA NA NA NA
#> 14: NA NA NA NA NA NA NA NA
#> 15: NA NA NA NA NA NA NA NA
#> tmax15 tmin2 tmin3 tmin4 tmin5 tmin6 tmin7 tmin8 tmin9
#> 1: 23.53333 12.50 13.33333 13.625 13.4 13.16667 13.28571 13.3750 13.11111
#> 2: NA 14.75 14.66667 14.125 13.7 13.75000 13.78571 13.4375 13.66667
#> 3: NA 14.75 14.00000 13.500 13.6 13.66667 13.28571 13.5625 13.83333
#> 4: NA 13.50 13.00000 13.250 13.4 13.00000 13.35714 13.6875 13.72222
#> 5: NA 12.25 12.83333 13.125 12.7 13.16667 13.57143 13.6250 13.50000
#> 6: NA 13.00 13.33333 12.750 13.3 13.75000 13.78571 13.6250 13.72222
#> 7: NA 14.00 13.00000 13.625 14.1 14.08333 13.85714 13.9375 14.05556
#> 8: NA 12.50 13.50000 14.125 14.1 13.83333 13.92857 14.0625 NA
#> 9: NA 13.25 14.16667 14.125 13.8 13.91667 14.07143 NA NA
#> 10: NA 15.75 15.16667 14.500 14.5 14.58333 NA NA NA
#> 11: NA 15.00 14.16667 14.250 14.4 NA NA NA NA
#> 12: NA 13.25 13.66667 14.000 NA NA NA NA NA
#> 13: NA 13.50 14.00000 NA NA NA NA NA NA
#> 14: NA 14.75 NA NA NA NA NA NA NA
#> 15: NA NA NA NA NA NA NA NA NA
#> tmin10 tmin11 tmin12 tmin13 tmin14 tmin15 r_hum2 r_hum3
#> 1: 13.35 13.59091 13.62500 13.53846 13.60714 13.7 78.14408 77.64095
#> 2: 13.90 13.90909 13.79167 13.84615 13.92857 NA 84.44942 78.04021
#> 3: 13.85 13.72727 13.79167 13.88462 NA NA 70.92823 68.06884
#> 4: 13.60 13.68182 13.79167 NA NA NA 63.78592 65.86603
#> 5: 13.60 13.72727 NA NA NA NA 66.18816 74.13004
#> 6: 13.85 NA NA NA NA NA 80.02003 82.47741
#> 7: NA NA NA NA NA NA 88.70299 79.31099
#> 8: NA NA NA NA NA NA 73.95957 76.99486
#> 9: NA NA NA NA NA NA 71.79621 73.84333
#> 10: NA NA NA NA NA NA 80.50151 73.17286
#> 11: NA NA NA NA NA NA 68.22658 67.49419
#> 12: NA NA NA NA NA NA 62.27250 67.31563
#> 13: NA NA NA NA NA NA 71.71566 77.33818
#> 14: NA NA NA NA NA NA 82.99256 NA
#> 15: NA NA NA NA NA NA NA NA
#> r_hum4 r_hum5 r_hum6 r_hum7 r_hum8 r_hum9 r_hum10 r_hum11
#> 1: 74.53616 72.09894 71.75349 74.36211 75.99087 74.27265 75.15193 75.40517
#> 2: 74.11767 73.29939 76.08513 77.70042 75.55374 76.38837 76.54329 74.90441
#> 3: 68.55819 72.84932 75.27313 73.16653 74.40390 74.79653 73.16843 72.51943
#> 4: 71.90298 75.00082 72.58851 74.08521 74.56676 72.78329 72.10791 72.58918
#> 5: 77.44557 74.06185 75.56245 75.90176 73.72848 72.87303 73.32592 74.71295
#> 6: 76.98980 78.20493 78.16037 75.35397 74.18840 74.54546 75.94923 NA
#> 7: 80.24960 79.78720 76.24193 74.78300 75.11036 76.60734 NA NA
#> 8: 77.23054 73.48755 72.24453 72.98129 74.93153 NA NA NA
#> 9: 70.01139 69.21500 70.57948 73.15144 NA NA NA NA
#> 10: 71.38700 72.58998 75.25552 NA NA NA NA NA
#> 11: 69.97112 73.69354 NA NA NA NA NA NA
#> 12: 72.63253 NA NA NA NA NA NA NA
#> 13: NA NA NA NA NA NA NA NA
#> 14: NA NA NA NA NA NA NA NA
#> 15: NA NA NA NA NA NA NA NA
#> r_hum12 r_hum13 r_hum14 r_hum15
#> 1: 73.99771 73.38476 73.67170 74.6658
#> 2: 74.16483 74.41383 75.42593 NA
#> 3: 72.92630 74.13068 NA NA
#> 4: 73.92201 NA NA NA
#> 5: NA NA NA NA
#> 6: NA NA NA NA
#> 7: NA NA NA NA
#> 8: NA NA NA NA
#> 9: NA NA NA NA
#> 10: NA NA NA NA
#> 11: NA NA NA NA
#> 12: NA NA NA NA
#> 13: NA NA NA NA
#> 14: NA NA NA NA
#> 15: NA NA NA NA
Created on 2020-11-10 by the reprex package (v0.3.0)
Upvotes: 1
Reputation: 25225
Here is another option:
cols <- c("prcp", "tmax", "tmin", "r_hum")
winsz <- 2L:15L
DT[, as.vector(outer(cols, winsz, FUN=paste, sep="_")) := {
res <- rep(NA, .N)
ans <- outer(.SD, winsz, function(x, k) {
Map(function(v, j) replace(res, 1L:j, sum(v[1L:j]) / j), x, k)
})
}, site, .SDcols=cols]
output of head(DT)
:
site precursor_date lat lon origDate last begin precursor_day_labl year yday dayl prcp srad swe tmax tmin vp satv r_hum prcp_2 tmax_2 tmin_2 r_hum_2 prcp_3 tmax_3 tmin_3 r_hum_3 prcp_4 tmax_4 tmin_4 r_hum_4 prcp_5
1: 100 2011-08-17 46.864 -67.998 2011-08-18 2011 2011 obd_p1 2011 229 50112.0 0 403.2 0 22.5 10.5 1280 19.99250 64.02400 8.5 21.5 12.5 78.14408 7.333333 22.5 13.33333 77.64095 6.5 23.5 13.625 74.53616 6.4
2: 100 2011-08-16 46.864 -67.998 2011-08-18 2011 2011 obd_p2 2011 228 50457.6 17 176.0 0 20.5 14.5 1640 17.77505 92.26416 8.5 21.5 12.5 78.14408 7.333333 22.5 13.33333 77.64095 6.5 23.5 13.625 74.53616 6.4
3: 100 2011-08-15 46.864 -67.998 2011-08-18 2011 2011 obd_p3 2011 227 50457.6 5 249.6 0 24.5 15.0 1720 22.44415 76.63468 NA NA NA NA 7.333333 22.5 13.33333 77.64095 6.5 23.5 13.625 74.53616 6.4
4: 100 2011-08-14 46.864 -67.998 2011-08-18 2011 2011 obd_p4 2011 226 50803.2 4 288.0 0 26.5 14.5 1640 25.14498 65.22178 NA NA NA NA NA NA NA NA 6.5 23.5 13.625 74.53616 6.4
5: 100 2011-08-13 46.864 -67.998 2011-08-18 2011 2011 obd_p5 2011 225 50803.2 6 297.6 0 25.0 12.5 1440 23.09541 62.35006 NA NA NA NA NA NA NA NA NA NA NA NA 6.4
6: 100 2011-08-12 46.864 -67.998 2011-08-18 2011 2011 obd_p6 2011 224 51148.8 6 268.8 0 22.5 12.0 1400 19.99250 70.02625 NA NA NA NA NA NA NA NA NA NA NA NA NA
tmax_5 tmin_5 r_hum_5 prcp_6 tmax_6 tmin_6 r_hum_6 prcp_7 tmax_7 tmin_7 r_hum_7 prcp_8 tmax_8 tmin_8 r_hum_8 prcp_9 tmax_9 tmin_9 r_hum_9 prcp_10 tmax_10 tmin_10 r_hum_10 prcp_11 tmax_11 tmin_11 r_hum_11 prcp_12 tmax_12
1: 23.8 13.4 72.09894 6.333333 23.58333 13.16667 71.75349 7.285714 23.14286 13.28571 74.36211 7.375 22.875 13.375 75.99087 6.555556 23 13.11111 74.27265 7.5 23.05 13.35 75.15193 8.090909 23.22727 13.59091 75.40517 7.916667 23.625
2: 23.8 13.4 72.09894 6.333333 23.58333 13.16667 71.75349 7.285714 23.14286 13.28571 74.36211 7.375 22.875 13.375 75.99087 6.555556 23 13.11111 74.27265 7.5 23.05 13.35 75.15193 8.090909 23.22727 13.59091 75.40517 7.916667 23.625
3: 23.8 13.4 72.09894 6.333333 23.58333 13.16667 71.75349 7.285714 23.14286 13.28571 74.36211 7.375 22.875 13.375 75.99087 6.555556 23 13.11111 74.27265 7.5 23.05 13.35 75.15193 8.090909 23.22727 13.59091 75.40517 7.916667 23.625
4: 23.8 13.4 72.09894 6.333333 23.58333 13.16667 71.75349 7.285714 23.14286 13.28571 74.36211 7.375 22.875 13.375 75.99087 6.555556 23 13.11111 74.27265 7.5 23.05 13.35 75.15193 8.090909 23.22727 13.59091 75.40517 7.916667 23.625
5: 23.8 13.4 72.09894 6.333333 23.58333 13.16667 71.75349 7.285714 23.14286 13.28571 74.36211 7.375 22.875 13.375 75.99087 6.555556 23 13.11111 74.27265 7.5 23.05 13.35 75.15193 8.090909 23.22727 13.59091 75.40517 7.916667 23.625
6: NA NA NA 6.333333 23.58333 13.16667 71.75349 7.285714 23.14286 13.28571 74.36211 7.375 22.875 13.375 75.99087 6.555556 23 13.11111 74.27265 7.5 23.05 13.35 75.15193 8.090909 23.22727 13.59091 75.40517 7.916667 23.625
tmin_12 r_hum_12 prcp_13 tmax_13 tmin_13 r_hum_13 prcp_14 tmax_14 tmin_14 r_hum_14 prcp_15 tmax_15 tmin_15 r_hum_15
1: 13.625 73.99771 7.307692 23.65385 13.53846 73.38476 6.785714 23.64286 13.60714 73.6717 6.8 23.53333 13.7 74.6658
2: 13.625 73.99771 7.307692 23.65385 13.53846 73.38476 6.785714 23.64286 13.60714 73.6717 6.8 23.53333 13.7 74.6658
3: 13.625 73.99771 7.307692 23.65385 13.53846 73.38476 6.785714 23.64286 13.60714 73.6717 6.8 23.53333 13.7 74.6658
4: 13.625 73.99771 7.307692 23.65385 13.53846 73.38476 6.785714 23.64286 13.60714 73.6717 6.8 23.53333 13.7 74.6658
5: 13.625 73.99771 7.307692 23.65385 13.53846 73.38476 6.785714 23.64286 13.60714 73.6717 6.8 23.53333 13.7 74.6658
6: 13.625 73.99771 7.307692 23.65385 13.53846 73.38476 6.785714 23.64286 13.60714 73.6717 6.8 23.53333 13.7 74.6658
You might get a warning:
Warning message: In
[.data.table
(DT, ,:=
(as.vector(outer(cols, winsz, FUN = paste, : Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.
But I think its safe to ignore.
Upvotes: 1