runnin4tay5000
runnin4tay5000

Reputation: 87

R: How to create a new data frame from two other data frames

I have two data frames containing related data. It is related to the NFL. One df has player names and receiving targets by week (player df):

           Player  Tm Position  1 2  3  4 5  6
1      A.J. Green CIN       WR 13 8 11 12 8 10
2 Aaron Burbridge SFO       WR  0 1  0  2 0  0
3 Aaron Ripkowski GNB       RB  0 0  0  0 0  1
4  Adam Humphries TAM       WR  5 8 12  4 2  0
5    Adam Thielen MIN       WR  5 5  4  3 8  0
6 Adrian Peterson MIN       RB  2 3  0  0 0  0

The other data frame has recieving targets summed by team for each week (team df):

       Tm   `1`   `2`   `3`   `4`   `5`   `6`
   <fctr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     ARI    37    35    50    45    26    35
2     ATL    38    34    30    37    28    41
3     BAL    32    45    40    51    47    48
4     BUF    22    30    20    33    20    26
5     CAR    31    39    36    47    28    46
6     CHI    28    29    45    36    41    49
7     CIN    30    54    28    31    39    31
8     CLE    26    33    38    38    35    42
9     DAL    43    30    24    32    24    27
10    DEN    26    32    35    31    34    47
# ... with 22 more rows

What I am trying to do is create another data frame containing the target percentage by player, by week. So I need to match the team from the "Tm" column in the player df and the week column header (1-6).

I have figured out how to do this by merging them and then creating new rows, but as I add more data (weeks) I need to write more code:

a <- merge(playertgt, teamtgt, by="Tm") #merges the two 
    a$Wk1 <- a$`1.x` / a$`1.y`
    a$Wk2 <- a$`2.x` / a$`2.y`
    a$Wk3 <- a$`3.x` / a$`3.y`

So what I am looking for is a good way to do this that will auto update and doesn't make me have to create a df with a bunch of columns I don't need and that will update with new weeks as I add them to my source data.

If this is answered somewhere else I apologize, but I have been looking for a good way to do this for a day now, and I can't find it. Thanks in advance for your help!

Upvotes: 0

Views: 868

Answers (2)

sluedtke
sluedtke

Reputation: 334

it would be nice if you provide a bit of data the next time, that makes live a lot easier.

I think the main point is your data structure. I think you have to put your data into a long format (keyword is tidy-data I guess). I made up some data and hope I understood your problem correctly.

library(tidyr)
library(dplyr)


player_df = data.frame(team = c('ARI', 'BAL', 'BAL', 'CLE', 'CLE'),
                       player =c('A', 'B', 'C', 'D', 'F'), 
                       '1' = floor(runif(5, min=1, max=2)*10),
                       '2' = floor(runif(5, min=1, max=2)*10))
> player_df
  team player X1 X2
1  ARI      A 15 10
2  BAL      B 16 15
3  BAL      C 13 11
4  CLE      D 14 19
5  CLE      F 12 14

team_df = data.frame(team = c('ARI', 'BAL', 'CLE'),
                       '1' = floor(runif(3, min=10, max=20)*20),
                       '2' = floor(runif(3, min=10, max=20)*20))
> team_df
  team  X1  X2
1  ARI 281 205
2  BAL 362 309
3  CLE 323 238

Now, put both dataframes into a long format:

player_df = gather(player_df, week, player_value, -team, -player)
team_df = gather(team_df, week, team_value, -team)

> player_df
   team player week player_value
1   ARI      A   X1           15
2   BAL      B   X1           16
3   BAL      C   X1           13
4   CLE      D   X1           14
5   CLE      F   X1           12
6   ARI      A   X2           10
7   BAL      B   X2           15
8   BAL      C   X2           11
9   CLE      D   X2           19
10  CLE      F   X2           14
> team_df
  team week team_value
1  ARI   X1        281
2  BAL   X1        362
3  CLE   X1        323
4  ARI   X2        205
5  BAL   X2        309
6  CLE   X2        238

Now, join (or merge) them together. inner_join will by default join on common column names.

join_db = inner_join(player_df, team_df)
> join_db
   team player week player_value team_value
1   ARI      A   X1           15        281
2   BAL      B   X1           16        362
3   BAL      C   X1           13        362
4   CLE      D   X1           14        323
5   CLE      F   X1           12        323
6   ARI      A   X2           10        205
7   BAL      B   X2           15        309
8   BAL      C   X2           11        309
9   CLE      D   X2           19        238
10  CLE      F   X2           14        238

I think in that format you can do a lot more.

HTH

Stefan

Upvotes: 2

aichao
aichao

Reputation: 7455

You can do this with dplyr:

library(dplyr)
## Do a left outer join to match each player with total team targets
a <- left_join(playertgt,teamtgt, by="Tm")
## Compute percentage over all weeks selecting player columns ending with ".x"
## and dividing by corresponding team columns ending with ".y"
tgt.pct <- select(a,ends_with(".x")) / select(a,ends_with(".y"))
## set the column names to week + number
colnames(tgt.pct) <- paste0("week",seq_len(ncol(teamtgt)-1))
## construct the output data frame adding back the player and team columns
tgt.pct <- data.frame(Player=playertgt$Player,Tm=playertgt$Tm,tgt.pct)

Clearly, I am only using dplyr for the convenience of ends_with in selecting the columns after the join. A base-R approach using grepl to do this selection is:

a <- merge(playertgt, teamtgt, by="Tm", all.x=TRUE)
tgt.pct <- subset(a,select=grepl(".x$",colnames(a))) / subset(a,select=grepl(".y$",colnames(a)))
colnames(tgt.pct) <- paste0("week",seq_len(ncol(teamtgt)-1))
tgt.pct <- data.frame(Player=playertgt$Player,Tm=playertgt$Tm,tgt.pct)

Data: with your limited posted data, only AJ Green will have his target percentage computed:

playertgt <- structure(list(Player = structure(1:6, .Label = c("A.J. Green", 
"Aaron Burbridge", "Aaron Ripkowski", "Adam Humphries", "Adam Thielen", 
"Adrian Peterson"), class = "factor"), Tm = structure(c(1L, 4L, 
2L, 5L, 3L, 3L), .Label = c("CIN", "GNB", "MIN", "SFO", "TAM"
), class = "factor"), Position = structure(c(2L, 2L, 1L, 2L, 
2L, 1L), .Label = c("RB", "WR"), class = "factor"), X1 = c(13L, 
0L, 0L, 5L, 5L, 2L), X2 = c(8L, 1L, 0L, 8L, 5L, 3L), X3 = c(11L, 
0L, 0L, 12L, 4L, 0L), X4 = c(12L, 2L, 0L, 4L, 3L, 0L), X5 = c(8L, 
0L, 0L, 2L, 8L, 0L), X6 = c(10L, 0L, 1L, 0L, 0L, 0L)), .Names = c("Player", 
"Tm", "Position", "X1", "X2", "X3", "X4", "X5", "X6"), class = "data.frame", row.names = c(NA, 
-6L))
##           Player  Tm Position X1 X2 X3 X4 X5 X6
##1      A.J. Green CIN       WR 13  8 11 12  8 10
##2 Aaron Burbridge SFO       WR  0  1  0  2  0  0
##3 Aaron Ripkowski GNB       RB  0  0  0  0  0  1
##4  Adam Humphries TAM       WR  5  8 12  4  2  0
##5    Adam Thielen MIN       WR  5  5  4  3  8  0
##6 Adrian Peterson MIN       RB  2  3  0  0  0  0

teamtgt <- structure(list(Tm = structure(1:10, .Label = c("ARI", "ATL", 
"BAL", "BUF", "CAR", "CHI", "CIN", "CLE", "DAL", "DEN"), class = "factor"), 
    X1 = c(37L, 38L, 32L, 22L, 31L, 28L, 30L, 26L, 43L, 26L), 
    X2 = c(35L, 34L, 45L, 30L, 39L, 29L, 54L, 33L, 30L, 32L), 
    X3 = c(50L, 30L, 40L, 20L, 36L, 45L, 28L, 38L, 24L, 35L), 
    X4 = c(45L, 37L, 51L, 33L, 47L, 36L, 31L, 38L, 32L, 31L), 
    X5 = c(26L, 28L, 47L, 20L, 28L, 41L, 39L, 35L, 24L, 34L), 
    X6 = c(35L, 41L, 48L, 26L, 46L, 49L, 31L, 42L, 27L, 47L)), .Names = c("Tm", 
"X1", "X2", "X3", "X4", "X5", "X6"), class = "data.frame", row.names = c(NA, 
-10L))
##    Tm X1 X2 X3 X4 X5 X6
##1  ARI 37 35 50 45 26 35
##2  ATL 38 34 30 37 28 41
##3  BAL 32 45 40 51 47 48
##4  BUF 22 30 20 33 20 26
##5  CAR 31 39 36 47 28 46
##6  CHI 28 29 45 36 41 49
##7  CIN 30 54 28 31 39 31
##8  CLE 26 33 38 38 35 42
##9  DAL 43 30 24 32 24 27
##10 DEN 26 32 35 31 34 47

The result is:

##           Player  Tm     week1     week2     week3     week4     week5     week6
##1      A.J. Green CIN 0.4333333 0.1481481 0.3928571 0.3870968 0.2051282 0.3225806
##2 Aaron Burbridge SFO        NA        NA        NA        NA        NA        NA
##3 Aaron Ripkowski GNB        NA        NA        NA        NA        NA        NA
##4  Adam Humphries TAM        NA        NA        NA        NA        NA        NA
##5    Adam Thielen MIN        NA        NA        NA        NA        NA        NA
##6 Adrian Peterson MIN        NA        NA        NA        NA        NA        NA

Upvotes: 2

Related Questions