Reputation: 159
I have a dataframe df
in R of sports teams, including a column Season
, Date
, and Goals
for every match, like so:
Season Date Home Away Goals
1 2013 06/04/2013 Arsenal Chelsea 3
2 2013 06/04/2013 Tottenham Newcastle 5
3 2013 10/04/2013 Milan Rangers 4
4 2013 07/05/2013 Real Bayern 1
5 2013 07/05/2013 Juventus Inter 2
6 2013 07/06/2013 Barcelona Chelsea 1
7 2014 03/04/2014 Braga Benfica 3
8 2014 13/04/2014 Rangers Chelsea 1
9 2014 08/05/2014 Bayern Tottenham 0
10 2014 10/05/2014 Newcastle Braga 2
I would like to group this data into months for each season and calculate :
My aim is to get a new dataframe like this:
Season month TotalGoalsMonth AvGoalsMonth AvGoalsSeason
1 2013 4 12 4.0 2.67
2 2013 5 3 1.5 2.67
3 2013 6 1 1.0 2.67
4 2014 4 4 2.0 1.50
5 2014 5 2 2.0 1.50
So for season 2013, there were months 4, 5, 6. In months 4 there were 3 matches played yielding a total of 12 goals (TotalGoalsMonth
column). The average goals per month is 12/3=4.0 (AvGoalsMonth
column). And the same for months 5, 6, then repeat for season 2014, etc.
A total of 6 matches were played in 2013, with a total of 16 goals, so average goals per 2013 season is 16 / 6 = 2.67 (AvGoalsSeason
column). Then repeat for season 2014, and so on.
Please explain how to do this in R.
Upvotes: 0
Views: 30
Reputation: 389215
Turn the dates into Date class, extract month from it. Calculate Total goals, Average goals and number of matches for each month.
library(dplyr)
library(lubridate)
df %>%
mutate(Date = dmy(Date),
Month = month(Date)) %>%
summarise(TotalGoalsMonth = sum(Goals),
AvGoalsMonth = mean(Goals),
NumOfMatches = n(),
.by = c(Season, Month)) %>%
mutate(AvGoalsSeason = sum(TotalGoalsMonth)/sum(NumOfMatches), .by = Season)
# Season Month TotalGoalsMonth AvGoalsMonth NumOfMatches AvGoalsSeason
#1 2013 4 12 4.0 3 2.666667
#2 2013 5 3 1.5 2 2.666667
#3 2013 6 1 1.0 1 2.666667
#4 2014 4 4 2.0 2 1.500000
#5 2014 5 2 1.0 2 1.500000
If your data is huge, you may use collapse
whose syntax is similar to that of dplyr
but it is much faster. (Even faster than base R in many cases.) Here's a dplyr
-free version.
library(collapse)
df |>
fmutate(Date = as.Date(Date, '%d/%m/%Y'),
Month = format(Date, '%m')) %>%
fgroup_by(Season, Month) |>
fsummarise(TotalGoalsMonth = sum(Goals),
AvGoalsMonth = mean(Goals),
NumOfMatches = GRPN()) |>
fgroup_by(Season) |>
fmutate(AvGoalsSeason = sum(TotalGoalsMonth)/sum(NumOfMatches))
Upvotes: 1