Reputation: 616
I have a dataframe that looks like this (but with hundreds of columns):
df_in <-
data.frame(
x_1 = 0:3,
x_2 = 30:33,
x_3 = 20:23,
y_1 = c("a", "c", "b", "c"),
y_2 = c("b", "a", "c", "b"),
y_3 = c("c", "b", "a", "a"))
Let's say that the y columns contain the names of sports teams (a, b, c); x variables, linked to y variables by the column index number, is the number of goals that team scored; each row represents a 3-way contest.
For each contest I want create new columns capturing the performance of two of the teams, a and b. This is the output I want, using long-hand approach:
df_in %>%
mutate(
a_value =
case_when(
y_1 == "a" ~ x_1,
y_2 == "a" ~ x_2,
y_3 == "a" ~ x_3),
b_value =
case_when(
y_1 == "b" ~ x_1,
y_2 == "b" ~ x_2,
y_3 == "b" ~ x_3))
I need to make this more concise, given that there are many more columns in the real data. I'm guessing there's a tidyverse way to do this but I'm unsure.
Upvotes: 2
Views: 380
Reputation: 21908
Here is another approach you could use which more or less the same:
library(dplyr)
library(tidyr)
df_in %>%
bind_cols(df_in %>%
pivot_longer(everything(),
names_to = c(".value"),
names_pattern = "([[:alpha:]])_\\d") %>%
mutate(ID = rep(seq_len(n()/(length(unique(y)))), each = length(unique(y)))) %>%
group_by(ID) %>%
pivot_wider(names_from = y, values_from = x, names_glue = "value_{y}") %>%
select(-ID))
Adding missing grouping variables: `ID`
x_1 x_2 x_3 y_1 y_2 y_3 ID value_a value_b value_c
1 0 30 20 a b c 1 0 30 20
2 1 31 21 c a b 2 31 21 1
3 2 32 22 b c a 3 22 2 32
4 3 33 23 c b a 4 23 33 3
Upvotes: 0
Reputation: 16832
A bit more complicated than I thought on first glance. I gave each match an ID so you can (no pun intended) match values based on original row. The first step is to reshape so you're getting a variable (x or y) and an index (1, 2, 3,...). Reshaping back to wide gets you your a_value
and b_value
columns (I didn't drop the team c observations since you said this needed to scale). Then join this back to the original data frame by match. You could do the join all in one step (similar to a SQL subquery) but that is a bit messy.
library(tidyr)
df_in <- tibble::rowid_to_column(df_in, "match")
team_vals <- df_in %>%
pivot_longer(-match, names_to = c(".value", "index"), names_pattern = "(^[a-z])_(\\d$)") %>%
pivot_wider(id_cols = match, names_from = y, values_from = x, names_glue = "{y}_value")
team_vals
#> # A tibble: 4 × 4
#> match a_value b_value c_value
#> <int> <int> <int> <int>
#> 1 1 0 30 20
#> 2 2 31 21 1
#> 3 3 22 2 32
#> 4 4 23 33 3
dplyr::left_join(df_in, team_vals, by = "match")
#> match x_1 x_2 x_3 y_1 y_2 y_3 a_value b_value c_value
#> 1 1 0 30 20 a b c 0 30 20
#> 2 2 1 31 21 c a b 31 21 1
#> 3 3 2 32 22 b c a 22 2 32
#> 4 4 3 33 23 c b a 23 33 3
Upvotes: 2