Morts81
Morts81

Reputation: 439

Manipulating R Data Frames

I've currently got two separate data frames, excerpts as per below:

mydata

Player  TG% Pts Team    Opp Yr      Rd  Grnd
John    56  42  A       1   2015    1   Grnd1
James   94  64  B       2   2015    1   Grnd2
Jerry   85  78  C       3   2015    1   Grnd3
Daniel  97  51  D       4   2015    1   Grnd4
John    89  61  A       1   2015    1   Grnd2
James   65  26  B       4   2015    1   Grnd3
Jerry   73  34  C       3   2015    1   Grnd2
Daniel  73  40  D       2   2015    1   Grnd2
John    89  26  A       1   2015    1   Grnd3
James   92  42  B       3   2015    1   Grnd1
Jerry   89  25  C       2   2015    1   Grnd2
Daniel  80  41  D       4   2015    1   Grnd2
John    73  82  A       3   2015    1   Grnd3
James   73  41  B       4   2015    1   Grnd3
Jerry   89  76  C       2   2015    1   Grnd1
Daniel  91  77  D       1   2015    1   Grnd2

round

Team   Opp    Grnd
A      1      Grnd1
B      3      Grnd4
C      4      Grnd2
D      2      Grnd3

What I want to be able to do is manipulate this so that I generate a second data frame as per below

Player   Gms   Avg.Pts   Avg.Last3   Avg.v.Opp   [email protected]
John
James
Jerry
Daniel

I know how to do this in Excel, however I'm struggling in R

Gms - total number of games for each individual player (excel would be countif)

Avg.Pts - this is the average of Pts for each Player name (excel would be averageif)

Avg.Last3 - this is the average of Pts for each Player in their last 3 games, note that the data frame is in order with most recent games at the end of the data frame.

Avg.v.Opp - this is the average of Pts for each player against the next opponent as defined in data frame round. For example John plays for team A and his next opponent is Opp 1. (excel would be averageifs)

[email protected] - this is the average of Pts for each player at the next ground as defined in data fram round. For example John plays for team A and his next game is held at Grnd1. (excel would be averageifs)

I've tried using dplyr and a number of other options but haven't seemed to successfully put together something that works at this stage. Note that mydata data frame runs to over 10,000+ rows.

Upvotes: 0

Views: 76

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145755

I think this will work. If you share your sample data with dput(), I'll be happy to copy/paste it and check (and debug if necessary).

First I'll do the easy ones, the ones that don't depend on round:

library(dplyr)
group_by(mydata, Player) %>%
   summarize(Gms = n(),
       Avg.Pts = mean(Pts),
       Avg.Last3 = mean(tail(Pts, 3)))

I wanted to do that one separately to emphasize how clean dplyr can be for simple cases. All the "ifs" in your Excel commands are taken care of by the single group_by at the beginning. n() is the count, and mean() is the average. tail() is a handy base function that returns the end of a data frame or vector.

To add in the round data, we'll want to join the data frames together based on the Team column. We still we'll want to be able to tell the other columns apart whether they're from mydata or round, so I'll rename the round columns:

round = rename(round, next_opp = Opp, next_grnd = Grnd)

Then we'll start with the join and proceed as before. This time we do need some ifs at the end, which I'll do with a simple subset inside the mean calls:

left_join(mydata, round) %>%
  # convert ground columns to character as discussed in comments
  mutate(next_grnd = as.character(next_grnd),
       Grnd = as.character(Grnd)) %>%
  group_by(Player) %>%
  summarize(Gms = n(),
       Avg.Pts = mean(Pts),
       Avg.Last3 = mean(tail(Pts, 3)),
       Avg.v.Opp = mean(Pts[Opp == next_opp]),
       Avg.at.Grnd = mean(Pts[Grnd == next_grnd]))

Upvotes: 3

Related Questions