Anthony Amico
Anthony Amico

Reputation: 83

Sum in R based on a date range and another condition?

I am working on a dataframe of baseball data called mlb_team_logs. A random sample lies below.

       Date    Team season AB PA  H  X1B X2B X3B HR R RBI BB IBB SO HBP SF SH GDP
1   2015-04-06  ARI   2015 34 39  9   7   1   1  0 4   4  3   0  6   2  0  0   2
2   2015-04-07  ARI   2015 31 36  8   4   1   1  2 7   7  5   0  7   0  0  0   1
3   2015-04-08  ARI   2015 32 35  5   3   2   0  0 2   1  2   0  7   1  0  0   0
4   2015-04-10  ARI   2015 35 38  7   6   0   0  1 4   4  3   0 10   0  0  0   0
5   2015-04-11  ARI   2015 32 35 10   9   0   0  1 6   6  3   0  7   0  0  0   1
6   2015-04-12  ARI   2015 36 38 10   7   3   0  0 4   4  1   0 11   0  0  1   1
7   2015-04-13  ARI   2015 39 44 12   8   3   1  0 8   7  4   0 11   0  0  1   0
8   2015-04-14  ARI   2015 28 32  3   1   2   0  0 1   1  3   0  4   1  0  0   2
9   2015-04-15  ARI   2015 33 34  9   7   1   0  1 2   2  1   0  8   0  0  0   1
10  2015-04-16  ARI   2015 47 51 11   6   2   0  3 7   7  3   1  8   1  0  0   0
240 2015-07-03  ATL   2015 30 32  7   4   1   0  2 2   2  2   0  6   0  0  0   1
241 2015-07-04  ATL   2015 34 40 10   6   3   0  1 9   9  5   0  5   0  0  1   0
242 2015-07-05  ATL   2015 35 37  7   6   1   0  0 0   0  1   0 10   1  0  0   1
243 2015-07-06  ATL   2015 40 44 15  10   4   0  1 5   5  3   0  7   0  0  1   1
244 2015-07-07  ATL   2015 34 37 10   7   1   1  1 4   4  2   0  4   0  0  1   1
245 2015-07-08  ATL   2015 31 38  7   4   1   0  2 5   5  5   1  7   0  0  2   1
246 2015-07-09  ATL   2015 34 37 10   8   2   0  0 3   3  1   0  9   0  1  1   2
247 2015-07-10  ATL   2015 32 35  8   7   0   0  1 3   3  2   0  5   1  0  0   2
248 2015-07-11  ATL   2015 33 38  6   3   1   0  2 2   2  5   1  8   0  0  0   0
249 2015-07-12  ATL   2015 34 41  8   6   2   0  0 3   3  7   1 10   0  0  0   1
250 2015-07-17  ATL   2015 30 36  7   4   3   0  0 4   4  5   1  7   0  0  0   0

In total, the df has 43 total columns. My objective is to sum columns 4 (AB) to 43 on two criteria:

  1. the team
  2. the date is within 7 days of the entry in "Date" (ie Date - 7 to Date - 1)

Eventually, I would like these columns to be appended to mlb_team_logs as l7_AB, l7_PA, etc (but I know how to do that if the output will be a new dataframe). Any help is appreciated!

EDIT I altered the sample to allow for more easily tested results

Upvotes: 1

Views: 771

Answers (1)

langtang
langtang

Reputation: 24722

You might be able to use a data.table non-equi join here. The idea would be to create a lower date bound (below, I've named this date_lb), and then join the table on itself, matching on Team = Team, Date < Date, and Date >= date_lb. Then use lapply with .SDcols to sum the columns of interest.

  1. load library and set your frame to data.table
library(data.table)
setDT(mlb_team_logs)
  1. Identify the columns you want to sum, in a character vector (change to 4:43 in your full dataset)
sum_cols = names(mlb_team_logs)[4:19]
  1. Add a lower bound on date
df[, date_lb := Date-7]
  1. Join the table on itself, and use lapply(.SD, sum) on the columns of interest
result = mlb_team_logs[mlb_team_logs[, .(Team, Date, date_lb)], on=.(Team, Date<Date, Date>=date_lb)] %>% 
  .[, lapply(.SD, sum), by=.(Date,Team), .SDcols = sumcols ]
  1. Set the new names (inplace, using setnames())
setnames(result, old=sumcols, new=paste0("I7_",sumcols))

Output:

          Date   Team I7_AB I7_PA  I7_H I7_X1B I7_X2B I7_X3B I7_HR  I7_R I7_RBI I7_BB I7_IBB I7_SO I7_HBP I7_SF I7_SH I7_GDP
        <IDat> <char> <int> <int> <int>  <int>  <int>  <int> <int> <int>  <int> <int>  <int> <int>  <int> <int> <int>  <int>
 1: 2015-04-06    ARI    NA    NA    NA     NA     NA     NA    NA    NA     NA    NA     NA    NA     NA    NA    NA     NA
 2: 2015-04-07    ARI    34    39     9      7      1      1     0     4      4     3      0     6      2     0     0      2
 3: 2015-04-08    ARI    65    75    17     11      2      2     2    11     11     8      0    13      2     0     0      3
 4: 2015-04-10    ARI    97   110    22     14      4      2     2    13     12    10      0    20      3     0     0      3
 5: 2015-04-11    ARI   132   148    29     20      4      2     3    17     16    13      0    30      3     0     0      3
 6: 2015-04-12    ARI   164   183    39     29      4      2     4    23     22    16      0    37      3     0     0      4
 7: 2015-04-13    ARI   200   221    49     36      7      2     4    27     26    17      0    48      3     0     1      5
 8: 2015-04-14    ARI   205   226    52     37      9      2     4    31     29    18      0    53      1     0     2      3
 9: 2015-04-15    ARI   202   222    47     34     10      1     2    25     23    16      0    50      2     0     2      4
10: 2015-04-16    ARI   203   221    51     38      9      1     3    25     24    15      0    51      1     0     2      5
11: 2015-07-03    ATL    NA    NA    NA     NA     NA     NA    NA    NA     NA    NA     NA    NA     NA    NA    NA     NA
12: 2015-07-04    ATL    30    32     7      4      1      0     2     2      2     2      0     6      0     0     0      1
13: 2015-07-05    ATL    64    72    17     10      4      0     3    11     11     7      0    11      0     0     1      1
14: 2015-07-06    ATL    99   109    24     16      5      0     3    11     11     8      0    21      1     0     1      2
15: 2015-07-07    ATL   139   153    39     26      9      0     4    16     16    11      0    28      1     0     2      3
16: 2015-07-08    ATL   173   190    49     33     10      1     5    20     20    13      0    32      1     0     3      4
17: 2015-07-09    ATL   204   228    56     37     11      1     7    25     25    18      1    39      1     0     5      5
18: 2015-07-10    ATL   238   265    66     45     13      1     7    28     28    19      1    48      1     1     6      7
19: 2015-07-11    ATL   240   268    67     48     12      1     6    29     29    19      1    47      2     1     6      8
20: 2015-07-12    ATL   239   266    63     45     10      1     7    22     22    19      2    50      2     1     5      8
21: 2015-07-17    ATL    99   114    22     16      3      0     3     8      8    14      2    23      1     0     0      3
          Date   Team I7_AB I7_PA  I7_H I7_X1B I7_X2B I7_X3B I7_HR  I7_R I7_RBI I7_BB I7_IBB I7_SO I7_HBP I7_SF I7_SH I7_GDP

Upvotes: 2

Related Questions