Reputation: 83
I am working on a dataframe of baseball data called mlb_team_logs. A random sample lies below.
Date Team season AB PA H X1B X2B X3B HR R RBI BB IBB SO HBP SF SH GDP
1 2015-04-06 ARI 2015 34 39 9 7 1 1 0 4 4 3 0 6 2 0 0 2
2 2015-04-07 ARI 2015 31 36 8 4 1 1 2 7 7 5 0 7 0 0 0 1
3 2015-04-08 ARI 2015 32 35 5 3 2 0 0 2 1 2 0 7 1 0 0 0
4 2015-04-10 ARI 2015 35 38 7 6 0 0 1 4 4 3 0 10 0 0 0 0
5 2015-04-11 ARI 2015 32 35 10 9 0 0 1 6 6 3 0 7 0 0 0 1
6 2015-04-12 ARI 2015 36 38 10 7 3 0 0 4 4 1 0 11 0 0 1 1
7 2015-04-13 ARI 2015 39 44 12 8 3 1 0 8 7 4 0 11 0 0 1 0
8 2015-04-14 ARI 2015 28 32 3 1 2 0 0 1 1 3 0 4 1 0 0 2
9 2015-04-15 ARI 2015 33 34 9 7 1 0 1 2 2 1 0 8 0 0 0 1
10 2015-04-16 ARI 2015 47 51 11 6 2 0 3 7 7 3 1 8 1 0 0 0
240 2015-07-03 ATL 2015 30 32 7 4 1 0 2 2 2 2 0 6 0 0 0 1
241 2015-07-04 ATL 2015 34 40 10 6 3 0 1 9 9 5 0 5 0 0 1 0
242 2015-07-05 ATL 2015 35 37 7 6 1 0 0 0 0 1 0 10 1 0 0 1
243 2015-07-06 ATL 2015 40 44 15 10 4 0 1 5 5 3 0 7 0 0 1 1
244 2015-07-07 ATL 2015 34 37 10 7 1 1 1 4 4 2 0 4 0 0 1 1
245 2015-07-08 ATL 2015 31 38 7 4 1 0 2 5 5 5 1 7 0 0 2 1
246 2015-07-09 ATL 2015 34 37 10 8 2 0 0 3 3 1 0 9 0 1 1 2
247 2015-07-10 ATL 2015 32 35 8 7 0 0 1 3 3 2 0 5 1 0 0 2
248 2015-07-11 ATL 2015 33 38 6 3 1 0 2 2 2 5 1 8 0 0 0 0
249 2015-07-12 ATL 2015 34 41 8 6 2 0 0 3 3 7 1 10 0 0 0 1
250 2015-07-17 ATL 2015 30 36 7 4 3 0 0 4 4 5 1 7 0 0 0 0
In total, the df has 43 total columns. My objective is to sum columns 4 (AB) to 43 on two criteria:
Eventually, I would like these columns to be appended to mlb_team_logs as l7_AB, l7_PA, etc (but I know how to do that if the output will be a new dataframe). Any help is appreciated!
EDIT I altered the sample to allow for more easily tested results
Upvotes: 1
Views: 771
Reputation: 24722
You might be able to use a data.table
non-equi join here. The idea would be to create a lower date bound (below, I've named this date_lb
), and then join the table on itself, matching on Team = Team
, Date < Date
, and Date >= date_lb
. Then use lapply
with .SDcols
to sum the columns of interest.
library(data.table)
setDT(mlb_team_logs)
4:43
in your full dataset)sum_cols = names(mlb_team_logs)[4:19]
df[, date_lb := Date-7]
lapply(.SD, sum)
on the columns of interestresult = mlb_team_logs[mlb_team_logs[, .(Team, Date, date_lb)], on=.(Team, Date<Date, Date>=date_lb)] %>%
.[, lapply(.SD, sum), by=.(Date,Team), .SDcols = sumcols ]
setnames()
)setnames(result, old=sumcols, new=paste0("I7_",sumcols))
Output:
Date Team I7_AB I7_PA I7_H I7_X1B I7_X2B I7_X3B I7_HR I7_R I7_RBI I7_BB I7_IBB I7_SO I7_HBP I7_SF I7_SH I7_GDP
<IDat> <char> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1: 2015-04-06 ARI NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2: 2015-04-07 ARI 34 39 9 7 1 1 0 4 4 3 0 6 2 0 0 2
3: 2015-04-08 ARI 65 75 17 11 2 2 2 11 11 8 0 13 2 0 0 3
4: 2015-04-10 ARI 97 110 22 14 4 2 2 13 12 10 0 20 3 0 0 3
5: 2015-04-11 ARI 132 148 29 20 4 2 3 17 16 13 0 30 3 0 0 3
6: 2015-04-12 ARI 164 183 39 29 4 2 4 23 22 16 0 37 3 0 0 4
7: 2015-04-13 ARI 200 221 49 36 7 2 4 27 26 17 0 48 3 0 1 5
8: 2015-04-14 ARI 205 226 52 37 9 2 4 31 29 18 0 53 1 0 2 3
9: 2015-04-15 ARI 202 222 47 34 10 1 2 25 23 16 0 50 2 0 2 4
10: 2015-04-16 ARI 203 221 51 38 9 1 3 25 24 15 0 51 1 0 2 5
11: 2015-07-03 ATL NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
12: 2015-07-04 ATL 30 32 7 4 1 0 2 2 2 2 0 6 0 0 0 1
13: 2015-07-05 ATL 64 72 17 10 4 0 3 11 11 7 0 11 0 0 1 1
14: 2015-07-06 ATL 99 109 24 16 5 0 3 11 11 8 0 21 1 0 1 2
15: 2015-07-07 ATL 139 153 39 26 9 0 4 16 16 11 0 28 1 0 2 3
16: 2015-07-08 ATL 173 190 49 33 10 1 5 20 20 13 0 32 1 0 3 4
17: 2015-07-09 ATL 204 228 56 37 11 1 7 25 25 18 1 39 1 0 5 5
18: 2015-07-10 ATL 238 265 66 45 13 1 7 28 28 19 1 48 1 1 6 7
19: 2015-07-11 ATL 240 268 67 48 12 1 6 29 29 19 1 47 2 1 6 8
20: 2015-07-12 ATL 239 266 63 45 10 1 7 22 22 19 2 50 2 1 5 8
21: 2015-07-17 ATL 99 114 22 16 3 0 3 8 8 14 2 23 1 0 0 3
Date Team I7_AB I7_PA I7_H I7_X1B I7_X2B I7_X3B I7_HR I7_R I7_RBI I7_BB I7_IBB I7_SO I7_HBP I7_SF I7_SH I7_GDP
Upvotes: 2