Reputation: 439
I've currently got two separate data frames, excerpts as per below:
mydata
Player TG% Pts Team Opp Yr Rd Grnd
John 56 42 A 1 2015 1 Grnd1
James 94 64 B 2 2015 1 Grnd2
Jerry 85 78 C 3 2015 1 Grnd3
Daniel 97 51 D 4 2015 1 Grnd4
John 89 61 A 1 2015 1 Grnd2
James 65 26 B 4 2015 1 Grnd3
Jerry 73 34 C 3 2015 1 Grnd2
Daniel 73 40 D 2 2015 1 Grnd2
John 89 26 A 1 2015 1 Grnd3
James 92 42 B 3 2015 1 Grnd1
Jerry 89 25 C 2 2015 1 Grnd2
Daniel 80 41 D 4 2015 1 Grnd2
John 73 82 A 3 2015 1 Grnd3
James 73 41 B 4 2015 1 Grnd3
Jerry 89 76 C 2 2015 1 Grnd1
Daniel 91 77 D 1 2015 1 Grnd2
round
Team Opp Grnd
A 1 Grnd1
B 3 Grnd4
C 4 Grnd2
D 2 Grnd3
What I want to be able to do is manipulate this so that I generate a second data frame as per below
Player Gms Avg.Pts Avg.Last3 Avg.v.Opp [email protected]
John
James
Jerry
Daniel
I know how to do this in Excel, however I'm struggling in R
Gms - total number of games for each individual player (excel would be countif)
Avg.Pts - this is the average of Pts for each Player name (excel would be averageif)
Avg.Last3 - this is the average of Pts for each Player in their last 3 games, note that the data frame is in order with most recent games at the end of the data frame.
Avg.v.Opp - this is the average of Pts for each player against the next opponent as defined in data frame round. For example John plays for team A and his next opponent is Opp 1. (excel would be averageifs)
[email protected] - this is the average of Pts for each player at the next ground as defined in data fram round. For example John plays for team A and his next game is held at Grnd1. (excel would be averageifs)
I've tried using dplyr and a number of other options but haven't seemed to successfully put together something that works at this stage. Note that mydata data frame runs to over 10,000+ rows.
Upvotes: 0
Views: 76
Reputation: 145755
I think this will work. If you share your sample data with dput()
, I'll be happy to copy/paste it and check (and debug if necessary).
First I'll do the easy ones, the ones that don't depend on round
:
library(dplyr)
group_by(mydata, Player) %>%
summarize(Gms = n(),
Avg.Pts = mean(Pts),
Avg.Last3 = mean(tail(Pts, 3)))
I wanted to do that one separately to emphasize how clean dplyr
can be for simple cases. All the "ifs" in your Excel commands are taken care of by the single group_by
at the beginning. n()
is the count, and mean()
is the average. tail()
is a handy base function that returns the end of a data frame or vector.
To add in the round
data, we'll want to join the data frames together based on the Team
column. We still we'll want to be able to tell the other columns apart whether they're from mydata
or round
, so I'll rename the round
columns:
round = rename(round, next_opp = Opp, next_grnd = Grnd)
Then we'll start with the join
and proceed as before. This time we do need some if
s at the end, which I'll do with a simple subset inside the mean
calls:
left_join(mydata, round) %>%
# convert ground columns to character as discussed in comments
mutate(next_grnd = as.character(next_grnd),
Grnd = as.character(Grnd)) %>%
group_by(Player) %>%
summarize(Gms = n(),
Avg.Pts = mean(Pts),
Avg.Last3 = mean(tail(Pts, 3)),
Avg.v.Opp = mean(Pts[Opp == next_opp]),
Avg.at.Grnd = mean(Pts[Grnd == next_grnd]))
Upvotes: 3