Reputation: 87
I am trying to take a 3 game rolling average of some NFL data across columns, here is the data and resultant data frame:
data:
Player <- c("Player1", "Player2", "Player3", "Player4", "Player5")
Week1 <- c(10, 5, 6, 8, 7)
Week2 <- c(12, 9, 4, 2, 8)
Week3 <- c(4, 5, 4, 3, 12)
Week4 <- c(15, 7, 12, NA, 5)
Week5 <- c(NA, 5, 8, 11, 6)
q <- data.frame(Player, Week1, Week2, Week3, Week4, Week5)
data frame:
Player Week1 Week2 Week3 Week4 Week5
1 Player1 10 12 4 15 NA
2 Player2 5 9 5 7 5
3 Player3 6 4 4 12 8
4 Player4 8 2 3 NA 11
5 Player5 7 8 12 5 6
So what I want to do is take 3 game rolling averages across the weeks, starting with week 1. So for the players it would average Week1, Week2, Week3 and give me the value in a new column, then it would average Week2, Week3, Week4, and give me that value in a new column, etc...
In this case the new data frame should look like this:
Player Week1 Week2 Week3 Week4 Week5 Avg1 Avg2 Avg3
1 Player1 10 12 4 15 NA 8.7 10.3 NA
2 Player2 5 9 5 7 5 6.3 7.0 5.7
3 Player3 6 4 4 12 8 4.7 6.7 8.0
4 Player4 8 2 3 NA 11 4.3 4.3 5.3
5 Player5 7 8 12 5 6 9.0 8.3 7.7
Please notice that for Player4 in Week4 there was an NA that was disregarded...that would be a week the player didn't play for some reason, so I use the previous two games and the game after it for Avg3.
I need these new columns as I am going to be running a regression to see if the average of the 3 predicts the next value. Everything I can find about this has the rolling average down a single column, but I am very inexperienced, so any help with data formatting for a problem like this is appreciated. Thanks in advance for the help!
Upvotes: 1
Views: 76
Reputation: 388982
We can use rollmean
from the zoo
package
library(zoo)
t(apply(q[-1], 1, function(x) rollmean(x, 3))))
# Week2 Week3 Week4
#[1,] 8.666667 10.333333 NA
#[2,] 6.333333 7.000000 5.666667
#[3,] 4.666667 6.666667 8.000000
#[4,] 4.333333 NA NA
#[5,] 9.000000 8.333333 7.666667
Finally, to get a combined dataframe,
cbind(q, t(apply(q[-1], 1, function(x) rollmean(x, 3))))
# Player Week1 Week2 Week3 Week4 Week5 Week2 Week3 Week4
#1 Player1 10 12 4 15 NA 8.666667 10.333333 NA
#2 Player2 5 9 5 7 5 6.333333 7.000000 5.666667
#3 Player3 6 4 4 12 8 4.666667 6.666667 8.000000
#4 Player4 8 2 3 NA 11 4.333333 NA NA
#5 Player5 7 8 12 5 6 9.000000 8.333333 7.666667
If you are specific about the column names, you can always change it by
temp <- t(apply(q[-1], 1, function(x) rollmean(x, 3)))
colnames(temp) <- c("avg1", "avg2", "avg3")
and then use cbind
on temp
EDIT
To answer few of the OP's questions -
If you have more than one column to remove at the start, you can just select/deselect the columns with their index numbers
For example,
To deselect first two columns, you can use q[-c(1:2)]
which is equal to q[3:7]
which gives it a range of values to deselect/select.
function(x)
is called as anonymous function using which you can apply your own functions to every row of the data frame.
rollmean
does not handle NA
values. From the document of ?rollmean
The default method of rollmean does not handle inputs that contain NAs. In such cases, use rollapply instead.
Upvotes: 2