Reputation: 33
This is the first time I am posting a question here, so please be gentle :)
I have a data frame with goal and corner statistics from the English football/soccer league (Premier League) with one game per row.
head(premierleague) will give you something like this (made up data):
| Home | Home_goals | Away | Away_goals | Home_Corners | Away_Corners |
|------------ |------------ |----------- |------------ |-------------- |-------------- |
| Tottenham | 1 | Arsenal | 0 | 5 | 2 |
| Man United | 2 | Watford | 1 | 7 | 4 |
| Man City | 3 | West Ham | 0 | 10 | 2 |
| Chelsea | 2 | Arsenal | 1 | 7 | 6 |
| Tottenham | 4 | Norwich | 1 | 6 | 0 |
| Man United | 2 | Liverpool | 2 | 4 | 7 |
| Tottenham | 0 | Man City | 2 | 3 | 8 |
I would like to find for each entry in the column Home (first one is in this case Tottenham) the next 2 matching entries (rows 5 and 7) and paste them into new columns in row 1.
I want to do this for each row in my data frame and also keep all rows. I just want to add the statistics of the next two games as new columns:
Home_2
Home_goals_2
Away_2 and so on.
I honestly have no idea how to even google this and as far as my experience goes with stackoverflow, I am sure some of you will solve that within minutes :)
Any help is highly appreciated.
Thanks so much in advance
Philipp
Edit:
Don't really know if I can attach something here, but the dataframe is this:
premierleague <- data.frame("Home" = c("Tottenham", "ManUnited", "ManCity", "Chelsea", "Tottenham", "ManUnited", "Tottenham"),
"Home_goals" = c(1,2,3,2,4,2,0),
"Away" = c ("Arsenal", "Watford", "Westham", "Arsenal", "Norwich", "Liverpool", "ManCity"),
"Away_goals" = c(0,1,0,1,1,2,2),
"Home_corners" = c(5,7,10,7,6,4,3),
"Away_corners" = c(2,4,2,6,0,7,8))
### The desired result looks like this
premierleague_new <- data.frame(
"Home" = c("Tottenham", "ManUnited", "ManCity", "Chelsea", "Tottenham", "ManUnited", "Tottenham"),
"Home_goals" = c(1,2,3,2,4,2,0),
"Away" = c("Arsenal", "Watford", "Westham", "Arsenal", "Norwich", "Liverpool", "ManCity"),
"Away_goals" = c(0,1,0,1,1,2,2),
"Home_corners" = c(5,7,10,7,6,4,3),
"Away_corners" = c(2,4,2,6,0,7,8),
"Home_goals_2" = c(4,2,NA,NA,0,NA,NA),
"Away_2" = c("Norwich", "Liverpool",NA,NA,"ManCity",NA,NA),
"Away_goal_2" = c(1,2,NA,NA,2,NA,NA),
"Home_corn_2" = c(6,4,NA,NA,3,NA,NA),
"Away_corn_2" = c(0,7,NA,NA,8,NA,NA),
"Home_goal_3" = c(0,NA,NA,NA,NA,NA,NA),
"Away_3" = c("ManCity",NA,NA,NA,NA,NA,NA),
"Away_goal_3" = c(2,NA,NA,NA,NA,NA,NA),
"Home_corners_3" = c(3,NA,NA,NA,NA,NA,NA),
"Away_corners_3" = c(8,NA,NA,NA,NA,NA,NA)
)
Tottenham is the only team with all in all 3 games, so all columns are filled for Tottenham in row 1.
The 2nd entry for Tottenham in row 5 has only values for the second game because there is only one below that with Tottenham as home team in this example.
I hope, it's clearer now. Should be at least reproduceable.
Upvotes: 3
Views: 66
Reputation: 389012
We can group_by
Home
and use lead
to get values from next rows.
library(dplyr)
premierleague %>%
group_by(Home) %>%
mutate_at(vars(Home_goals:Away_corners), list(`2` = ~lead(.), `3` = ~lead(., 2)))
# Home Home_goals Away Away_goals Home_corners Away_corners Home_goals_2 Away_2
# <fct> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <fct>
#1 Tott… 1 Arse… 0 5 2 4 Norwi…
#2 ManU… 2 Watf… 1 7 4 2 Liver…
#3 ManC… 3 West… 0 10 2 NA NA
#4 Chel… 2 Arse… 1 7 6 NA NA
#5 Tott… 4 Norw… 1 6 0 0 ManCi…
#6 ManU… 2 Live… 2 4 7 NA NA
#7 Tott… 0 ManC… 2 3 8 NA NA
# … with 8 more variables: Away_goals_2 <dbl>, Home_corners_2 <dbl>,
# Away_corners_2 <dbl>, Home_goals_3 <dbl>, Away_3 <fct>, Away_goals_3 <dbl>,
# Home_corners_3 <dbl>, Away_corners_3 <dbl>
Upvotes: 1