Reputation: 85
I have kinda a dumb r question. I'm trying to aggregate the total number of events in the same week (Year_Week) by location (Field). The below screenshot of my table is similar to my csv file. I was already able to sum the Score by Field and Year_Week using the following code: aggregate(x = df$Score, by = list(df$Year_Week, df$Field), FUN = sum)
However, I'm not sure about how to sum the events by Year_Week and Field. For example, because 2 games occurred during the week of 2019-4 in Irvine, I'd like my output to show "2". Similarly, because three games occurred the week of 2019-2, I want my output to show "3". I'd like my output table to look something like this:
where sum_Game is the number of games that occurred in one location (Field) in a given Year_Week, and sum_Score is the sum of the scores of those games. Thanks in advance!
Upvotes: 1
Views: 116
Reputation: 887851
Using data.table
, convert the data to data.table (setDT
), grouped by 'Field', 'Year_Week', and get the number of observations (.N
) and sum
of 'Score' to return a summarised dataset with columns 'sum_Game' and 'sum_Score'
library(data.table)
setDT(df)[, .(sum_Game = .N, sum_Score = sum(Score)), by = .(Field, Year_Week)]
NOTE: The sum_Game
would not be a good column name as this gives the indication that we are summing up the 'Game_id' column which we are not. Instead, it is just returning the number of games.
In short, if we are careful in naming objects, column names etc., there wouldn't be much need for detailed description about the code
Upvotes: 1
Reputation: 79204
Are you looking for such a solution?
With aggregate
aggregate(.~Year_Week+Field, df, sum)
With dplyr
library(dplyr)
df %>%
group_by(Year_Week, Field) %>%
summarize(sumScour = sum(Score))
Year_Week Field sumScour
<chr> <chr> <int>
1 2019-12 Irvine 8
2 2019-2 Anaheim 4
3 2019-24 Anaheim 1
4 2019-4 Irvine 11
Upvotes: 1