lethalSinger
lethalSinger

Reputation: 616

Use tidyverse to generate new column inputting value from another column conditional on a value in an index-linked third column

I have a dataframe that looks like this (but with hundreds of columns):

df_in <- 
  data.frame(
    x_1 = 0:3,
    x_2 = 30:33,
    x_3 = 20:23,
    y_1 = c("a", "c", "b", "c"),
    y_2 = c("b", "a", "c", "b"),
    y_3 = c("c", "b", "a", "a"))

Let's say that the y columns contain the names of sports teams (a, b, c); x variables, linked to y variables by the column index number, is the number of goals that team scored; each row represents a 3-way contest.

For each contest I want create new columns capturing the performance of two of the teams, a and b. This is the output I want, using long-hand approach:

df_in %>% 
  mutate(
    a_value = 
      case_when(
        y_1 == "a" ~ x_1,
        y_2 == "a" ~ x_2,
        y_3 == "a" ~ x_3),
    b_value = 
      case_when(
        y_1 == "b" ~ x_1,
        y_2 == "b" ~ x_2,
        y_3 == "b" ~ x_3))

I need to make this more concise, given that there are many more columns in the real data. I'm guessing there's a tidyverse way to do this but I'm unsure.

Upvotes: 2

Views: 380

Answers (2)

Anoushiravan R
Anoushiravan R

Reputation: 21908

Here is another approach you could use which more or less the same:

library(dplyr)
library(tidyr)

df_in %>%
  bind_cols(df_in %>%
              pivot_longer(everything(), 
                           names_to = c(".value"), 
                           names_pattern = "([[:alpha:]])_\\d") %>%
              mutate(ID = rep(seq_len(n()/(length(unique(y)))), each = length(unique(y)))) %>%
              group_by(ID) %>%
              pivot_wider(names_from = y, values_from = x, names_glue = "value_{y}") %>%
              select(-ID))

Adding missing grouping variables: `ID`
  x_1 x_2 x_3 y_1 y_2 y_3 ID value_a value_b value_c
1   0  30  20   a   b   c  1       0      30      20
2   1  31  21   c   a   b  2      31      21       1
3   2  32  22   b   c   a  3      22       2      32
4   3  33  23   c   b   a  4      23      33       3

Upvotes: 0

camille
camille

Reputation: 16832

A bit more complicated than I thought on first glance. I gave each match an ID so you can (no pun intended) match values based on original row. The first step is to reshape so you're getting a variable (x or y) and an index (1, 2, 3,...). Reshaping back to wide gets you your a_value and b_value columns (I didn't drop the team c observations since you said this needed to scale). Then join this back to the original data frame by match. You could do the join all in one step (similar to a SQL subquery) but that is a bit messy.

library(tidyr)
df_in <- tibble::rowid_to_column(df_in, "match")
team_vals <- df_in %>%
  pivot_longer(-match, names_to = c(".value", "index"), names_pattern = "(^[a-z])_(\\d$)") %>%
  pivot_wider(id_cols = match, names_from = y, values_from = x, names_glue = "{y}_value")
team_vals
#> # A tibble: 4 × 4
#>   match a_value b_value c_value
#>   <int>   <int>   <int>   <int>
#> 1     1       0      30      20
#> 2     2      31      21       1
#> 3     3      22       2      32
#> 4     4      23      33       3

dplyr::left_join(df_in, team_vals, by = "match")
#>   match x_1 x_2 x_3 y_1 y_2 y_3 a_value b_value c_value
#> 1     1   0  30  20   a   b   c       0      30      20
#> 2     2   1  31  21   c   a   b      31      21       1
#> 3     3   2  32  22   b   c   a      22       2      32
#> 4     4   3  33  23   c   b   a      23      33       3

Upvotes: 2

Related Questions