runnin4tay5000
runnin4tay5000

Reputation: 87

R Language - Rolling averages across columns

I am trying to take a 3 game rolling average of some NFL data across columns, here is the data and resultant data frame:

data:

Player <- c("Player1", "Player2", "Player3", "Player4", "Player5")
Week1 <- c(10, 5, 6, 8, 7)
Week2 <- c(12, 9, 4, 2, 8)
Week3 <- c(4, 5, 4, 3, 12)
Week4 <- c(15, 7, 12, NA, 5)
Week5 <- c(NA, 5, 8, 11, 6)
q <- data.frame(Player, Week1, Week2, Week3, Week4, Week5)

data frame:

   Player Week1 Week2 Week3 Week4 Week5
1 Player1    10    12     4    15    NA
2 Player2     5     9     5     7     5
3 Player3     6     4     4    12     8
4 Player4     8     2     3    NA    11
5 Player5     7     8    12     5     6

So what I want to do is take 3 game rolling averages across the weeks, starting with week 1. So for the players it would average Week1, Week2, Week3 and give me the value in a new column, then it would average Week2, Week3, Week4, and give me that value in a new column, etc...

In this case the new data frame should look like this:

   Player Week1 Week2 Week3 Week4 Week5    Avg1    Avg2    Avg3
1 Player1    10    12     4    15    NA     8.7    10.3     NA
2 Player2     5     9     5     7     5     6.3     7.0     5.7
3 Player3     6     4     4    12     8     4.7     6.7     8.0
4 Player4     8     2     3    NA    11     4.3     4.3     5.3
5 Player5     7     8    12     5     6     9.0     8.3     7.7

Please notice that for Player4 in Week4 there was an NA that was disregarded...that would be a week the player didn't play for some reason, so I use the previous two games and the game after it for Avg3.

I need these new columns as I am going to be running a regression to see if the average of the 3 predicts the next value. Everything I can find about this has the rolling average down a single column, but I am very inexperienced, so any help with data formatting for a problem like this is appreciated. Thanks in advance for the help!

Upvotes: 1

Views: 76

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388982

We can use rollmean from the zoo package

library(zoo)
t(apply(q[-1], 1, function(x) rollmean(x, 3))))


#       Week2     Week3    Week4
#[1,] 8.666667 10.333333       NA
#[2,] 6.333333  7.000000 5.666667
#[3,] 4.666667  6.666667 8.000000
#[4,] 4.333333        NA       NA
#[5,] 9.000000  8.333333 7.666667

Finally, to get a combined dataframe,

cbind(q, t(apply(q[-1], 1, function(x) rollmean(x, 3))))

#   Player Week1 Week2 Week3 Week4 Week5    Week2     Week3    Week4
#1 Player1    10    12     4    15    NA 8.666667 10.333333       NA
#2 Player2     5     9     5     7     5 6.333333  7.000000 5.666667
#3 Player3     6     4     4    12     8 4.666667  6.666667 8.000000
#4 Player4     8     2     3    NA    11 4.333333        NA       NA
#5 Player5     7     8    12     5     6 9.000000  8.333333 7.666667

If you are specific about the column names, you can always change it by

temp <- t(apply(q[-1], 1, function(x) rollmean(x, 3)))
colnames(temp) <- c("avg1", "avg2", "avg3")

and then use cbind on temp

EDIT

To answer few of the OP's questions -

If you have more than one column to remove at the start, you can just select/deselect the columns with their index numbers

For example,

To deselect first two columns, you can use q[-c(1:2)] which is equal to q[3:7] which gives it a range of values to deselect/select.

function(x) is called as anonymous function using which you can apply your own functions to every row of the data frame.

rollmean does not handle NA values. From the document of ?rollmean

The default method of rollmean does not handle inputs that contain NAs. In such cases, use rollapply instead.

Upvotes: 2

Related Questions