Reputation: 21
I have a DataFrame in R with the following columns:
Here is an example of my data:
df <- data.frame(
season = c("2015/2016", "2015/2016"),
stage = c(1, 1),
home_team_api_id = c(1, 2),
away_team_api_id = c(2, 1),
home_team_goal = c(3, 2),
away_team_goal = c(1, 3),
match_api_id = c(101, 102)
)
I want to transform this DataFrame into a long format where each match has two rows: one for the home team and one for the away team. The following columns should be included in the transformed DataFrame:
Desired output: For the example input data, the desired output would look like this:
season stage match_api_id team_api_id opponent_team_api_id goals goals_conceded is_home
1 2015/2016 1 101 1 2 3 1 TRUE
2 2015/2016 1 101 2 1 1 3 FALSE
3 2015/2016 1 102 2 1 2 3 TRUE
4 2015/2016 1 102 1 2 3 1 FALSE
Here is what I've tried so far:
df_long <- df %>%
pivot_longer(cols = c(home_team_api_id, away_team_api_id),
names_to = "team_type",
values_to = "team_api_id") %>%
mutate(
is_home = ifelse(team_type == "home_team_api_id", TRUE, FALSE),
goals = ifelse(is_home, home_team_goal, away_team_goal),
goals_conceded = ifelse(is_home, away_team_goal, home_team_goal)
) %>%
select(match_api_id, season, stage, team_api_id, goals, goals_conceded, is_home)
# opponent_team_api_id basierend auf match_api_id anhängen
df_long <- df_long %>%
left_join(df %>%
select(match_api_id, home_team_api_id, away_team_api_id),
by = "match_api_id") %>%
mutate(
opponent_team_api_id = ifelse(is_home, away_team_api_id, home_team_api_id)
) %>%
select(-home_team_api_id, -away_team_api_id)
This is my result:
match_api_id season stage team_api_id goals goals_conceded is_home
1 2015/2016 1 1 1 3 TRUE
2 2015/2016 1 2 1 3 FALSE
3 2015/2016 1 2 2 3 TRUE
4 2015/2016 1 1 3 2 FALSE
What makes it difficult and seperates it from this question, is that I want to apply pivot_longer two times at once. I want to longer the goals and the teamIDs
How can I achieve this transformation in R? Any help would be greatly appreciated!
Thank you!
Upvotes: -1
Views: 61
Reputation: 109
I would pivot_longer
into 4 rows and then pivot_wider
back to 2 rows.
main <- df |>
pivot_longer(cols = c(home_team_api_id, away_team_api_id, home_team_goal, away_team_goal)) |>
separate_wider_delim(name, delim = "_team_", names = c("is_home", "var")) |>
pivot_wider(names_from = var, values_from = value)
The result would be:
season stage match_api_id is_home api_id goal
<chr> <dbl> <dbl> <chr> <dbl> <dbl>
1 2015/2016 1 101 home 1 3
2 2015/2016 1 101 away 2 1
3 2015/2016 1 102 home 2 2
4 2015/2016 1 102 away 1 3
If it is absolutely necessary to retain information about the opponent (which appears to be redundant), duplicate another dataset and merge it back.
sub <- main |>
mutate(is_home = ifelse(is_home == "home", "away", "home")) |>
rename_with(~ paste0("opponent_", .x), api_id:goal)
complete <- left_join(main, sub) |>
mutate(is_home = ifelse(is_home == "home", TRUE, FALSE))
The result would be:
season stage match_api_id is_home api_id goal opponent_api_id opponent_goal
<chr> <dbl> <dbl> <lgl> <dbl> <dbl> <dbl> <dbl>
1 2015/2016 1 101 TRUE 1 3 2 1
2 2015/2016 1 101 FALSE 2 1 1 3
3 2015/2016 1 102 TRUE 2 2 1 3
4 2015/2016 1 102 FALSE 1 3 2 2
I believe there are better solutions, such as simply mutate
variables with the original dataset, or use group_by
and summarise
after step 1.
Upvotes: 0