John Polo
John Polo

Reputation: 802

How to get slices of differing numbers of rows while subsetting by group in an R data.table

I have a data.table with observations of temp and other weather info over 15 days for several sites. This dput is for all observations for two sites.

library(data.table)

structure(list(site = c("100", "100", "100", "100", "100", "100", 
"100", "100", "100", "100", "100", "100", "100", "100", "100"
), precursor_date = structure(c(15203, 15202, 15201, 15200, 15199, 
15198, 15197, 15196, 15195, 15194, 15193, 15192, 15191, 15190, 
15189), class = "Date"), lat = c(46.864, 46.864, 46.864, 46.864, 
46.864, 46.864, 46.864, 46.864, 46.864, 46.864, 46.864, 46.864, 
46.864, 46.864, 46.864), lon = c(-67.998, -67.998, -67.998, -67.998, 
-67.998, -67.998, -67.998, -67.998, -67.998, -67.998, -67.998, 
-67.998, -67.998, -67.998, -67.998), origDate = structure(c(15204, 
15204, 15204, 15204, 15204, 15204, 15204, 15204, 15204, 15204, 
15204, 15204, 15204, 15204, 15204), class = "Date"), last = c(2011, 
2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 
2011, 2011, 2011), begin = c(2011, 2011, 2011, 2011, 2011, 2011, 
2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011), precursor_day_labl = structure(1:15, .Label = c("obd_p1", 
"obd_p2", "obd_p3", "obd_p4", "obd_p5", "obd_p6", "obd_p7", "obd_p8", 
"obd_p9", "obd_p10", "obd_p11", "obd_p12", "obd_p13", "obd_p14", 
"obd_p15"), class = "factor"), year = c(2011, 2011, 2011, 2011, 
2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011
), yday = c(229, 228, 227, 226, 225, 224, 223, 222, 221, 220, 
219, 218, 217, 216, 215), dayl = c(50112, 50457.6015625, 50457.6015625, 
50803.19921875, 50803.19921875, 51148.80078125, 51148.80078125, 
51494.3984375, 51494.3984375, 51840, 51840, 52185.6015625, 52185.6015625, 
52531.19921875, 52531.19921875), prcp = c(0, 17, 5, 4, 6, 6, 
13, 8, 0, 16, 14, 6, 0, 0, 7), srad = c(403.200012207031, 176, 
249.600006103516, 288, 297.600006103516, 268.799987792969, 179.199996948242, 
192, 406.399993896484, 208, 227.199996948242, 307.200012207031, 
371.200012207031, 304, 182.399993896484), swe = c(0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), tmax = c(22.5, 20.5, 24.5, 
26.5, 25, 22.5, 20.5, 21, 24, 23.5, 25, 28, 24, 23.5, 22), tmin = c(10.5, 
14.5, 15, 14.5, 12.5, 12, 14, 14, 11, 15.5, 16, 14, 12.5, 14.5, 
15), vp = c(1280, 1640, 1720, 1640, 1440, 1400, 1600, 1600, 1320, 
1760, 1800, 1600, 1440, 1640, 1720), satv = c(19.99250234375, 
17.77504867875, 22.44414580875, 25.14497571375, 23.09540625, 
19.99250234375, 17.77504867875, 18.30827693, 21.80845952, 21.18811306125, 
23.09540625, 27.34314816, 21.80845952, 21.18811306125, 19.41676944
), r_hum = c(64.024001497749, 92.2641636397099, 76.6346830330004, 
65.2217770527891, 62.3500614976193, 70.026251638163, 90.0138181850829, 
87.3921672758967, 60.526971141151, 83.0654431053979, 77.9375768720241, 
58.5155736507555, 66.0294230630738, 77.4018901663935, 88.5832221119478
)), class = c("data.table", "data.frame"), row.names = c(NA, 
-15L), .internal.selfref = <pointer: 0x000001b632fd1ef0>)

I want to get the mean of the weather data (prcp, tmax, tmin, r_hum) for each #-day interval of the 15 days moving backward in time from the start day, which I called origDate in the DT. The dates that go into each respective fifteen day window for each site is under precursor_date. There would only be one 2-day mean, one 3-day mean, one 4-day mean, etc. and those would be based on the respective #-day window that immediately preceded the origDate. For example, if the start day is 2011-08-18, I want the mean for 2 days before 8-18( 08-17 and 08-16), 3 days (08-17, 08-16, 08-15), etc., up to the largest window, 15-day (08-17 through 08-03). I don't need every small interval mean possible in the 15 day window. Just the one that immediately precedes origDate.

To give an idea of what the subset I want, in dplyr, I can use

df %>% group_by(site) %>% slice_head(n= x) 
# A tibble: 2,556 x 19
# Groups:   site [1,278]
site  precursor_date   lat   lon origDate    last begin precursor_day_l~  year  yday   dayl  prcp  srad   swe  tmax
   <chr> <date>         <dbl> <dbl> <date>     <dbl> <dbl> <fct>            <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
 1 100   2011-08-17      46.9 -68.0 2011-08-18  2011  2011 obd_p1            2011   229 50112      0  403.     0  22.5
 2 100   2011-08-16      46.9 -68.0 2011-08-18  2011  2011 obd_p2            2011   228 50458.    17  176      0  20.5
 3 101   2011-08-11      44.3 -72.7 2011-08-12  2011  2011 obd_p1            2011   223 50458.     8  272      0  26  
 4 101   2011-08-10      44.3 -72.7 2011-08-12  2011  2011 obd_p2            2011   222 50803.    25  253.     0  26.5
 5 102   2011-08-21      46.5 -68.0 2011-08-22  2011  2011 obd_p1            2011   233 49421.     0  378.     0  27  
 6 102   2011-08-20      46.5 -68.0 2011-08-22  2011  2011 obd_p2            2011   232 49421.     1  397.     0  28  

where x is the number of days I want to subset from each group before getting means. But if I use

df %>% group_by(site) %>% slice_head(n= x) %>% mean(prcp)

I get an error and I don't know why. The error is

Warning message:
In mean.default(., "prcp") :
  argument is not numeric or logical: returning NA

While I don't know why that error occurs, I'm more intent on getting the subsets to work within a data.table. The columns that I want subset means for are prcp, tmax, tmin, and r_hum. I would end up creating 60 new columns, 15 for each weather variable. And a lot of the columns would have NA or something, since the DT has the daily observations in columns. To give an idea of what the output might look like, here is a mock-up. It doesn't have to look this way, as long as I have the means for each weather variable and time window in the DT aligned with the appropriate site.

site precursor_date    lat     lon   origDate ... prcp2dmean prcp3dmean prcp4dmean ... tmax2dmean tmax3dmean ...
 100     2011-08-17 46.864 -67.998 2011-08-18 ...        1.2        1.4        1.4 ...         25         24 ...
 100     2011-08-16 46.864 -67.998 2011-08-18 ...        1.2        1.4        1.4 ...         25         24 ...
 100     2011-08-15 46.864 -67.998 2011-08-18 ...         NA        1.4        1.4 ...         NA         24 ...
 100     2011-08-14 46.864 -67.998 2011-08-18 ...         NA         NA        1.4 ...         NA         NA ...
 100     2011-08-13 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
 100     2011-08-12 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
 100     2011-08-11 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
 100     2011-08-10 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
 100     2011-08-09 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
 100     2011-08-08 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
 100     2011-08-07 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
 100     2011-08-06 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
 100     2011-08-05 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
 100     2011-08-04 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
 100     2011-08-03 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
  10     2011-08-17 46.864 -67.998 2011-08-18 ...        1.2        1.4        1.4 ...         25         24 ...
  10     2011-08-16 46.864 -67.998 2011-08-18 ...        1.2        1.4        1.4 ...         25         24 ...
  10     2011-08-15 46.864 -67.998 2011-08-18 ...         NA        1.4        1.4 ...         NA         24 ...
  10     2011-08-14 46.864 -67.998 2011-08-18 ...         NA         NA        1.4 ...         NA         NA ...
  10     2011-08-13 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
  10     2011-08-12 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
  10     2011-08-11 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
  10     2011-08-10 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
  10     2011-08-09 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
  10     2011-08-08 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
  10     2011-08-07 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
  10     2011-08-06 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
  10     2011-08-05 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
  10     2011-08-04 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...
  10     2011-08-03 46.864 -67.998 2011-08-18 ...         NA         NA         NA ...         NA         NA ...

In my DT I tried

pi_df5[, pi_df5 %>% slice_head(n=2) %>% mean(prcp), by = site]

But that doesn't work.

Upvotes: 5

Views: 111

Answers (2)

TimTeaFan
TimTeaFan

Reputation: 18541

Here is a approach using {data.table}'s frollmean. Note that this approach produces a lot of information you might not need. For example prcp2 shows the two day average for every day it can calculate that average without showing NAs. If I understand you correctly you would only need the two day average calculated for the last day. On the other hand, the 15 day average can only be calculated once, on the last day, while in your example output the 15 day average is shown in every single row.

Note: An earlier version of this post used eval(parse()) which is not necessary, since frollmean can take data.frames and vectors as x and n arguments as @jangorecki pointed out correctly in the comments.

Apologies for the long reprex most of it is the original data:

library(data.table)

# original data
dat <-
  structure(list(site = c("100", "100", "100", "100", "100", "100", "100", "100",
                          "100", "100", "100", "100", "100", "100", "100"),
                 precursor_date = structure(c(15203, 15202, 15201, 15200, 15199, 
                                              15198, 15197, 15196, 15195, 15194, 15193, 15192,
                                              15191, 15190, 15189), class = "Date"),
                 lat = c(46.864, 46.864, 46.864, 46.864, 46.864, 46.864, 46.864,
                         46.864, 46.864, 46.864, 46.864, 46.864,46.864, 46.864,
                         46.864),
                 lon = c(-67.998, -67.998, -67.998, -67.998,  -67.998, -67.998,
                         -67.998, -67.998, -67.998, -67.998, -67.998, -67.998,
                         -67.998, -67.998, -67.998),
                 origDate = structure(c(15204, 15204, 15204, 15204, 15204, 15204,
                                        15204, 15204, 15204, 15204,  15204, 15204,
                                        15204, 15204, 15204),class = "Date"),
                 last = c(2011,  2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011,
                          2011, 2011, 2011, 2011, 2011, 2011),
                 begin = c(2011, 2011, 2011, 2011, 2011, 2011,2011, 2011, 2011,
                           2011, 2011, 2011, 2011, 2011, 2011),
                 precursor_day_labl = structure(1:15,
                                                .Label = c("obd_p1",  "obd_p2", "obd_p3", "obd_p4", "obd_p5",
                                                           "obd_p6", "obd_p7", "obd_p8",  "obd_p9", "obd_p10",
                                                           "obd_p11", "obd_p12", "obd_p13", "obd_p14", "obd_p15"),
                                                class = "factor"),
                 year = c(2011, 2011, 2011, 2011,  2011, 2011,2011, 2011, 2011,
                          2011, 2011, 2011, 2011, 2011, 2011),
                 yday = c(229, 228, 227, 226, 225, 224, 223, 222, 221, 220, 219,
                          218, 217, 216, 215),
                 dayl = c(50112, 50457.6015625, 50457.6015625,  50803.19921875,
                          50803.19921875, 51148.80078125, 51148.80078125, 
                          51494.3984375, 51494.3984375, 51840, 51840, 52185.6015625,
                          52185.6015625,  52531.19921875, 52531.19921875),
                 prcp = c(0, 17, 5, 4, 6, 6,  13, 8, 0, 16, 14, 6, 0, 0, 7),
                 srad = c(403.200012207031, 176,  249.600006103516, 288,
                          297.600006103516, 268.799987792969, 179.199996948242, 
                          192, 406.399993896484, 208, 227.199996948242, 307.200012207031,
                          371.200012207031, 304, 182.399993896484),
                 swe = c(0, 0, 0, 0,  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
                 tmax = c(22.5, 20.5, 24.5, 26.5, 25, 22.5, 20.5, 21, 24, 23.5,
                          25, 28, 24, 23.5, 22),
                 tmin = c(10.5,  14.5, 15, 14.5, 12.5, 12, 14, 14, 11, 15.5, 16,
                          14, 12.5, 14.5, 15),
                 vp = c(1280, 1640, 1720, 1640, 1440, 1400, 1600, 1600, 1320, 1760,
                        1800, 1600, 1440, 1640, 1720),
                 satv = c(19.99250234375,  17.77504867875, 22.44414580875,
                          25.14497571375, 23.09540625,19.99250234375,
                          17.77504867875, 18.30827693, 21.80845952, 21.18811306125, 
                          23.09540625, 27.34314816, 21.80845952, 21.18811306125,
                          19.41676944),
                 r_hum = c(64.024001497749, 92.2641636397099, 76.6346830330004, 
                           65.2217770527891, 62.3500614976193, 70.026251638163,
                           90.0138181850829,  87.3921672758967, 60.526971141151,
                           83.0654431053979, 77.9375768720241, 58.5155736507555,
                           66.0294230630738, 77.4018901663935, 88.5832221119478)
  ), class = c("data.table", "data.frame"), row.names = c(NA, -15L))


# the actual calculation starts here -------------------------------
mean_vars <- c("prcp", "tmax", "tmin", "r_hum")

setDT(dat)[,
           c(paste0(rep(mean_vars, each = 14) , 2:15)) :=
             frollmean(.SD, 2:15, align = 'left'),
           by = site,
           .SDcols = mean_vars][]

#>     site precursor_date    lat     lon   origDate last begin precursor_day_labl
#>  1:  100     2011-08-17 46.864 -67.998 2011-08-18 2011  2011             obd_p1
#>  2:  100     2011-08-16 46.864 -67.998 2011-08-18 2011  2011             obd_p2
#>  3:  100     2011-08-15 46.864 -67.998 2011-08-18 2011  2011             obd_p3
#>  4:  100     2011-08-14 46.864 -67.998 2011-08-18 2011  2011             obd_p4
#>  5:  100     2011-08-13 46.864 -67.998 2011-08-18 2011  2011             obd_p5
#>  6:  100     2011-08-12 46.864 -67.998 2011-08-18 2011  2011             obd_p6
#>  7:  100     2011-08-11 46.864 -67.998 2011-08-18 2011  2011             obd_p7
#>  8:  100     2011-08-10 46.864 -67.998 2011-08-18 2011  2011             obd_p8
#>  9:  100     2011-08-09 46.864 -67.998 2011-08-18 2011  2011             obd_p9
#> 10:  100     2011-08-08 46.864 -67.998 2011-08-18 2011  2011            obd_p10
#> 11:  100     2011-08-07 46.864 -67.998 2011-08-18 2011  2011            obd_p11
#> 12:  100     2011-08-06 46.864 -67.998 2011-08-18 2011  2011            obd_p12
#> 13:  100     2011-08-05 46.864 -67.998 2011-08-18 2011  2011            obd_p13
#> 14:  100     2011-08-04 46.864 -67.998 2011-08-18 2011  2011            obd_p14
#> 15:  100     2011-08-03 46.864 -67.998 2011-08-18 2011  2011            obd_p15
#>     year yday    dayl prcp  srad swe tmax tmin   vp     satv    r_hum prcp2
#>  1: 2011  229 50112.0    0 403.2   0 22.5 10.5 1280 19.99250 64.02400   8.5
#>  2: 2011  228 50457.6   17 176.0   0 20.5 14.5 1640 17.77505 92.26416  11.0
#>  3: 2011  227 50457.6    5 249.6   0 24.5 15.0 1720 22.44415 76.63468   4.5
#>  4: 2011  226 50803.2    4 288.0   0 26.5 14.5 1640 25.14498 65.22178   5.0
#>  5: 2011  225 50803.2    6 297.6   0 25.0 12.5 1440 23.09541 62.35006   6.0
#>  6: 2011  224 51148.8    6 268.8   0 22.5 12.0 1400 19.99250 70.02625   9.5
#>  7: 2011  223 51148.8   13 179.2   0 20.5 14.0 1600 17.77505 90.01382  10.5
#>  8: 2011  222 51494.4    8 192.0   0 21.0 14.0 1600 18.30828 87.39217   4.0
#>  9: 2011  221 51494.4    0 406.4   0 24.0 11.0 1320 21.80846 60.52697   8.0
#> 10: 2011  220 51840.0   16 208.0   0 23.5 15.5 1760 21.18811 83.06544  15.0
#> 11: 2011  219 51840.0   14 227.2   0 25.0 16.0 1800 23.09541 77.93758  10.0
#> 12: 2011  218 52185.6    6 307.2   0 28.0 14.0 1600 27.34315 58.51557   3.0
#> 13: 2011  217 52185.6    0 371.2   0 24.0 12.5 1440 21.80846 66.02942   0.0
#> 14: 2011  216 52531.2    0 304.0   0 23.5 14.5 1640 21.18811 77.40189   3.5
#> 15: 2011  215 52531.2    7 182.4   0 22.0 15.0 1720 19.41677 88.58322    NA
#>         prcp3 prcp4 prcp5    prcp6    prcp7 prcp8    prcp9 prcp10   prcp11
#>  1:  7.333333  6.50   6.4 6.333333 7.285714 7.375 6.555556    7.5 8.090909
#>  2:  8.666667  8.00   7.6 8.500000 8.428571 7.375 8.333333    8.9 8.636364
#>  3:  5.000000  5.25   6.8 7.000000 6.000000 7.250 8.000000    7.8 7.090909
#>  4:  5.333333  7.25   7.4 6.166667 7.571429 8.375 8.111111    7.3 6.636364
#>  5:  8.333333  8.25   6.6 8.166667 9.000000 8.625 7.666667    6.9 6.909091
#>  6:  9.000000  6.75   8.6 9.500000 9.000000 7.875 7.000000    7.0       NA
#>  7:  7.000000  9.25  10.2 9.500000 8.142857 7.125 7.111111     NA       NA
#>  8:  8.000000  9.50   8.8 7.333333 6.285714 6.375       NA     NA       NA
#>  9: 10.000000  9.00   7.2 6.000000 6.142857    NA       NA     NA       NA
#> 10: 12.000000  9.00   7.2 7.166667       NA    NA       NA     NA       NA
#> 11:  6.666667  5.00   5.4       NA       NA    NA       NA     NA       NA
#> 12:  2.000000  3.25    NA       NA       NA    NA       NA     NA       NA
#> 13:  2.333333    NA    NA       NA       NA    NA       NA     NA       NA
#> 14:        NA    NA    NA       NA       NA    NA       NA     NA       NA
#> 15:        NA    NA    NA       NA       NA    NA       NA     NA       NA
#>       prcp12   prcp13   prcp14 prcp15 tmax2    tmax3  tmax4 tmax5    tmax6
#>  1: 7.916667 7.307692 6.785714    6.8 21.50 22.50000 23.500  23.8 23.58333
#>  2: 7.916667 7.307692 7.285714     NA 22.50 23.83333 24.125  23.8 23.25000
#>  3: 6.500000 6.538462       NA     NA 25.50 25.33333 24.625  23.8 23.33333
#>  4: 6.666667       NA       NA     NA 25.75 24.66667 23.625  23.1 23.25000
#>  5:       NA       NA       NA     NA 23.75 22.66667 22.250  22.6 22.75000
#>  6:       NA       NA       NA     NA 21.50 21.33333 22.000  22.3 22.75000
#>  7:       NA       NA       NA     NA 20.75 21.83333 22.250  22.8 23.66667
#>  8:       NA       NA       NA     NA 22.50 22.83333 23.375  24.3 24.25000
#>  9:       NA       NA       NA     NA 23.75 24.16667 25.125  24.9 24.66667
#> 10:       NA       NA       NA     NA 24.25 25.50000 25.125  24.8 24.33333
#> 11:       NA       NA       NA     NA 26.50 25.66667 25.125  24.5       NA
#> 12:       NA       NA       NA     NA 26.00 25.16667 24.375    NA       NA
#> 13:       NA       NA       NA     NA 23.75 23.16667     NA    NA       NA
#> 14:       NA       NA       NA     NA 22.75       NA     NA    NA       NA
#> 15:       NA       NA       NA     NA    NA       NA     NA    NA       NA
#>        tmax7   tmax8    tmax9 tmax10   tmax11   tmax12   tmax13   tmax14
#>  1: 23.14286 22.8750 23.00000  23.05 23.22727 23.62500 23.65385 23.64286
#>  2: 22.92857 23.0625 23.11111  23.30 23.72727 23.75000 23.73077 23.60714
#>  3: 23.42857 23.4375 23.61111  24.05 24.04545 24.00000 23.84615       NA
#>  4: 23.28571 23.5000 24.00000  24.00 23.95455 23.79167       NA       NA
#>  5: 23.07143 23.6875 23.72222  23.70 23.54545       NA       NA       NA
#>  6: 23.50000 23.5625 23.55556  23.40       NA       NA       NA       NA
#>  7: 23.71429 23.6875 23.50000     NA       NA       NA       NA       NA
#>  8: 24.14286 23.8750       NA     NA       NA       NA       NA       NA
#>  9: 24.28571      NA       NA     NA       NA       NA       NA       NA
#> 10:       NA      NA       NA     NA       NA       NA       NA       NA
#> 11:       NA      NA       NA     NA       NA       NA       NA       NA
#> 12:       NA      NA       NA     NA       NA       NA       NA       NA
#> 13:       NA      NA       NA     NA       NA       NA       NA       NA
#> 14:       NA      NA       NA     NA       NA       NA       NA       NA
#> 15:       NA      NA       NA     NA       NA       NA       NA       NA
#>       tmax15 tmin2    tmin3  tmin4 tmin5    tmin6    tmin7   tmin8    tmin9
#>  1: 23.53333 12.50 13.33333 13.625  13.4 13.16667 13.28571 13.3750 13.11111
#>  2:       NA 14.75 14.66667 14.125  13.7 13.75000 13.78571 13.4375 13.66667
#>  3:       NA 14.75 14.00000 13.500  13.6 13.66667 13.28571 13.5625 13.83333
#>  4:       NA 13.50 13.00000 13.250  13.4 13.00000 13.35714 13.6875 13.72222
#>  5:       NA 12.25 12.83333 13.125  12.7 13.16667 13.57143 13.6250 13.50000
#>  6:       NA 13.00 13.33333 12.750  13.3 13.75000 13.78571 13.6250 13.72222
#>  7:       NA 14.00 13.00000 13.625  14.1 14.08333 13.85714 13.9375 14.05556
#>  8:       NA 12.50 13.50000 14.125  14.1 13.83333 13.92857 14.0625       NA
#>  9:       NA 13.25 14.16667 14.125  13.8 13.91667 14.07143      NA       NA
#> 10:       NA 15.75 15.16667 14.500  14.5 14.58333       NA      NA       NA
#> 11:       NA 15.00 14.16667 14.250  14.4       NA       NA      NA       NA
#> 12:       NA 13.25 13.66667 14.000    NA       NA       NA      NA       NA
#> 13:       NA 13.50 14.00000     NA    NA       NA       NA      NA       NA
#> 14:       NA 14.75       NA     NA    NA       NA       NA      NA       NA
#> 15:       NA    NA       NA     NA    NA       NA       NA      NA       NA
#>     tmin10   tmin11   tmin12   tmin13   tmin14 tmin15   r_hum2   r_hum3
#>  1:  13.35 13.59091 13.62500 13.53846 13.60714   13.7 78.14408 77.64095
#>  2:  13.90 13.90909 13.79167 13.84615 13.92857     NA 84.44942 78.04021
#>  3:  13.85 13.72727 13.79167 13.88462       NA     NA 70.92823 68.06884
#>  4:  13.60 13.68182 13.79167       NA       NA     NA 63.78592 65.86603
#>  5:  13.60 13.72727       NA       NA       NA     NA 66.18816 74.13004
#>  6:  13.85       NA       NA       NA       NA     NA 80.02003 82.47741
#>  7:     NA       NA       NA       NA       NA     NA 88.70299 79.31099
#>  8:     NA       NA       NA       NA       NA     NA 73.95957 76.99486
#>  9:     NA       NA       NA       NA       NA     NA 71.79621 73.84333
#> 10:     NA       NA       NA       NA       NA     NA 80.50151 73.17286
#> 11:     NA       NA       NA       NA       NA     NA 68.22658 67.49419
#> 12:     NA       NA       NA       NA       NA     NA 62.27250 67.31563
#> 13:     NA       NA       NA       NA       NA     NA 71.71566 77.33818
#> 14:     NA       NA       NA       NA       NA     NA 82.99256       NA
#> 15:     NA       NA       NA       NA       NA     NA       NA       NA
#>       r_hum4   r_hum5   r_hum6   r_hum7   r_hum8   r_hum9  r_hum10  r_hum11
#>  1: 74.53616 72.09894 71.75349 74.36211 75.99087 74.27265 75.15193 75.40517
#>  2: 74.11767 73.29939 76.08513 77.70042 75.55374 76.38837 76.54329 74.90441
#>  3: 68.55819 72.84932 75.27313 73.16653 74.40390 74.79653 73.16843 72.51943
#>  4: 71.90298 75.00082 72.58851 74.08521 74.56676 72.78329 72.10791 72.58918
#>  5: 77.44557 74.06185 75.56245 75.90176 73.72848 72.87303 73.32592 74.71295
#>  6: 76.98980 78.20493 78.16037 75.35397 74.18840 74.54546 75.94923       NA
#>  7: 80.24960 79.78720 76.24193 74.78300 75.11036 76.60734       NA       NA
#>  8: 77.23054 73.48755 72.24453 72.98129 74.93153       NA       NA       NA
#>  9: 70.01139 69.21500 70.57948 73.15144       NA       NA       NA       NA
#> 10: 71.38700 72.58998 75.25552       NA       NA       NA       NA       NA
#> 11: 69.97112 73.69354       NA       NA       NA       NA       NA       NA
#> 12: 72.63253       NA       NA       NA       NA       NA       NA       NA
#> 13:       NA       NA       NA       NA       NA       NA       NA       NA
#> 14:       NA       NA       NA       NA       NA       NA       NA       NA
#> 15:       NA       NA       NA       NA       NA       NA       NA       NA
#>      r_hum12  r_hum13  r_hum14 r_hum15
#>  1: 73.99771 73.38476 73.67170 74.6658
#>  2: 74.16483 74.41383 75.42593      NA
#>  3: 72.92630 74.13068       NA      NA
#>  4: 73.92201       NA       NA      NA
#>  5:       NA       NA       NA      NA
#>  6:       NA       NA       NA      NA
#>  7:       NA       NA       NA      NA
#>  8:       NA       NA       NA      NA
#>  9:       NA       NA       NA      NA
#> 10:       NA       NA       NA      NA
#> 11:       NA       NA       NA      NA
#> 12:       NA       NA       NA      NA
#> 13:       NA       NA       NA      NA
#> 14:       NA       NA       NA      NA
#> 15:       NA       NA       NA      NA

Created on 2020-11-10 by the reprex package (v0.3.0)

Upvotes: 1

chinsoon12
chinsoon12

Reputation: 25225

Here is another option:

cols <- c("prcp", "tmax", "tmin", "r_hum")
winsz <- 2L:15L
DT[, as.vector(outer(cols, winsz, FUN=paste, sep="_")) := {
        res <- rep(NA, .N)
        ans <- outer(.SD, winsz, function(x, k) {
            Map(function(v, j) replace(res, 1L:j, sum(v[1L:j]) / j), x, k)
        })
    }, site, .SDcols=cols]

output of head(DT):

   site precursor_date    lat     lon   origDate last begin precursor_day_labl year yday    dayl prcp  srad swe tmax tmin   vp     satv    r_hum prcp_2 tmax_2 tmin_2  r_hum_2   prcp_3 tmax_3   tmin_3  r_hum_3 prcp_4 tmax_4 tmin_4  r_hum_4 prcp_5
1:  100     2011-08-17 46.864 -67.998 2011-08-18 2011  2011             obd_p1 2011  229 50112.0    0 403.2   0 22.5 10.5 1280 19.99250 64.02400    8.5   21.5   12.5 78.14408 7.333333   22.5 13.33333 77.64095    6.5   23.5 13.625 74.53616    6.4
2:  100     2011-08-16 46.864 -67.998 2011-08-18 2011  2011             obd_p2 2011  228 50457.6   17 176.0   0 20.5 14.5 1640 17.77505 92.26416    8.5   21.5   12.5 78.14408 7.333333   22.5 13.33333 77.64095    6.5   23.5 13.625 74.53616    6.4
3:  100     2011-08-15 46.864 -67.998 2011-08-18 2011  2011             obd_p3 2011  227 50457.6    5 249.6   0 24.5 15.0 1720 22.44415 76.63468     NA     NA     NA       NA 7.333333   22.5 13.33333 77.64095    6.5   23.5 13.625 74.53616    6.4
4:  100     2011-08-14 46.864 -67.998 2011-08-18 2011  2011             obd_p4 2011  226 50803.2    4 288.0   0 26.5 14.5 1640 25.14498 65.22178     NA     NA     NA       NA       NA     NA       NA       NA    6.5   23.5 13.625 74.53616    6.4
5:  100     2011-08-13 46.864 -67.998 2011-08-18 2011  2011             obd_p5 2011  225 50803.2    6 297.6   0 25.0 12.5 1440 23.09541 62.35006     NA     NA     NA       NA       NA     NA       NA       NA     NA     NA     NA       NA    6.4
6:  100     2011-08-12 46.864 -67.998 2011-08-18 2011  2011             obd_p6 2011  224 51148.8    6 268.8   0 22.5 12.0 1400 19.99250 70.02625     NA     NA     NA       NA       NA     NA       NA       NA     NA     NA     NA       NA     NA
   tmax_5 tmin_5  r_hum_5   prcp_6   tmax_6   tmin_6  r_hum_6   prcp_7   tmax_7   tmin_7  r_hum_7 prcp_8 tmax_8 tmin_8  r_hum_8   prcp_9 tmax_9   tmin_9  r_hum_9 prcp_10 tmax_10 tmin_10 r_hum_10  prcp_11  tmax_11  tmin_11 r_hum_11  prcp_12 tmax_12
1:   23.8   13.4 72.09894 6.333333 23.58333 13.16667 71.75349 7.285714 23.14286 13.28571 74.36211  7.375 22.875 13.375 75.99087 6.555556     23 13.11111 74.27265     7.5   23.05   13.35 75.15193 8.090909 23.22727 13.59091 75.40517 7.916667  23.625
2:   23.8   13.4 72.09894 6.333333 23.58333 13.16667 71.75349 7.285714 23.14286 13.28571 74.36211  7.375 22.875 13.375 75.99087 6.555556     23 13.11111 74.27265     7.5   23.05   13.35 75.15193 8.090909 23.22727 13.59091 75.40517 7.916667  23.625
3:   23.8   13.4 72.09894 6.333333 23.58333 13.16667 71.75349 7.285714 23.14286 13.28571 74.36211  7.375 22.875 13.375 75.99087 6.555556     23 13.11111 74.27265     7.5   23.05   13.35 75.15193 8.090909 23.22727 13.59091 75.40517 7.916667  23.625
4:   23.8   13.4 72.09894 6.333333 23.58333 13.16667 71.75349 7.285714 23.14286 13.28571 74.36211  7.375 22.875 13.375 75.99087 6.555556     23 13.11111 74.27265     7.5   23.05   13.35 75.15193 8.090909 23.22727 13.59091 75.40517 7.916667  23.625
5:   23.8   13.4 72.09894 6.333333 23.58333 13.16667 71.75349 7.285714 23.14286 13.28571 74.36211  7.375 22.875 13.375 75.99087 6.555556     23 13.11111 74.27265     7.5   23.05   13.35 75.15193 8.090909 23.22727 13.59091 75.40517 7.916667  23.625
6:     NA     NA       NA 6.333333 23.58333 13.16667 71.75349 7.285714 23.14286 13.28571 74.36211  7.375 22.875 13.375 75.99087 6.555556     23 13.11111 74.27265     7.5   23.05   13.35 75.15193 8.090909 23.22727 13.59091 75.40517 7.916667  23.625
   tmin_12 r_hum_12  prcp_13  tmax_13  tmin_13 r_hum_13  prcp_14  tmax_14  tmin_14 r_hum_14 prcp_15  tmax_15 tmin_15 r_hum_15
1:  13.625 73.99771 7.307692 23.65385 13.53846 73.38476 6.785714 23.64286 13.60714  73.6717     6.8 23.53333    13.7  74.6658
2:  13.625 73.99771 7.307692 23.65385 13.53846 73.38476 6.785714 23.64286 13.60714  73.6717     6.8 23.53333    13.7  74.6658
3:  13.625 73.99771 7.307692 23.65385 13.53846 73.38476 6.785714 23.64286 13.60714  73.6717     6.8 23.53333    13.7  74.6658
4:  13.625 73.99771 7.307692 23.65385 13.53846 73.38476 6.785714 23.64286 13.60714  73.6717     6.8 23.53333    13.7  74.6658
5:  13.625 73.99771 7.307692 23.65385 13.53846 73.38476 6.785714 23.64286 13.60714  73.6717     6.8 23.53333    13.7  74.6658
6:  13.625 73.99771 7.307692 23.65385 13.53846 73.38476 6.785714 23.64286 13.60714  73.6717     6.8 23.53333    13.7  74.6658

You might get a warning:

Warning message: In [.data.table(DT, , :=(as.vector(outer(cols, winsz, FUN = paste, : Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.

But I think its safe to ignore.

Upvotes: 1

Related Questions