Reputation: 7141
This has been a tricky problem for which I am really excited to hear solutions. I have what I call "double-columns", i.e. columns of which the content can be split into two separate columns.
This is my input:
structure(list(`A1-A2` = c(2, 1, 1), `A1-A3` = c(2, 1, 2)), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))
# A tibble: 3 x 2
`A1-A2` `A1-A3`
<dbl> <dbl>
1 2 2
2 1 1
3 1 2
For one column, I can demonstrate what I want to do, but not for several:
data %>%
separate(`A1-A2`, into = c("A1", "A2"), sep = ":") %>%
mutate_at(.vars = c(1:2), as.numeric) %>%
mutate(A2 = A1 -1) %>%
mutate(A1 = ifelse(A1 == 2, 0, A1))
# A tibble: 3 x 3
A1 A2 `A1-A3`
<dbl> <dbl> <dbl>
1 0 1 2
2 1 0 1
3 1 0 2
A1-A2
column into two separate columns A1 and A2. The resulting table should finally aggregate all winning scores for each column like this:
# A tibble: 1 x 3
A1 A2 A3
1 3 1 2
Two challenges:
How formulate my code in a generic format for any number of double-columns?
How can you avoid problems because several split columns have the same name (e.g. when the double-columns A1-A2, A1-A3, A2-A3
are split, they will have A1, A2, A3
occurring twice)??
Approaches in tidyverse (purrr::map)
are preferred, but I am open to other solutions.
Tricky, isn't it?
Upvotes: 2
Views: 72
Reputation: 7141
I put together this solution helped by @akrun who inspired to use pivot_longer
and mutate
with case_when
. If anybody has a more elegant or shorter solution, please post!
data
# A tibble: 3 x 2
`A1-A2` `A1-A3`
<dbl> <dbl>
1 2 2
2 1 1
3 1 2
comparisons <- data %>%
pivot_longer(everything()) %>%
separate(name, c("V1", "V2"), sep = "-") %>%
mutate(win = case_when(value == 2 ~ V2, TRUE ~ V1)) %>%
select(-value) %T>% print
# A tibble: 6 x 3
V1 V2 win
<chr> <chr> <chr>
1 A1 A2 A2
2 A1 A3 A3
3 A1 A2 A1
4 A1 A3 A1
5 A1 A2 A1
6 A1 A3 A3
scores <- comparisons %>%
group_by(win) %>%
tally() %>%
pivot_wider(names_from = win, values_from = n) %T>% print
# A tibble: 1 x 3
A1 A2 A3
<int> <int> <int>
1 3 1 2
Upvotes: 1