Reputation: 3081
I have a tibble in which I have, among others, some character columns and a single list column containing additional character in only some rows, otherwise blank. Each time this analysis is used, the number of character columns may change, but the naming remains consistent - they always start C (e.g. C1L, C1H etc).
Example dataset:
exemplar = tibble(group = c(rep("group1", 5), rep("group2",5)),
char1 = letters[1:10],
char2 = letters[2:11],
char3 = letters[3:12],
extra = list("",
"",
"",
"",
"",
list("x","y"),
"",
"",
"",
"")
)
But in 'live' use there could be 2, 4, 6, however many "char#" columns.
I'm trying to add a list column containing a list or character vector of all unique values in the char+extra columns. I can do it rowwise, but I need to do it for each group. Example desired output:
# A tibble: 10 x 6
group char1 char2 char3 extra unique
<chr> <chr> <chr> <chr> <list> <list>
1 group1 a b c <chr [1]> <chr [7]>
2 group1 b c d <chr [1]> <chr [7]>
3 group1 c d e <chr [1]> <chr [7]>
4 group1 d e f <chr [1]> <chr [7]>
5 group1 e f g <chr [1]> <chr [7]>
6 group2 f g h <list [2]> <chr [9]>
7 group2 g h i <chr [1]> <chr [9]>
8 group2 h i j <chr [1]> <chr [9]>
9 group2 i j k <chr [1]> <chr [9]>
10 group2 j k l <chr [1]> <chr [9]>
each 'group 1' row has c("a", "b", "c", "d", "e", "f", "g") in the unique
column.
each 'group 2' row has c("f", "g", "h", "i", "j", "k", "l", "x", "y") in the unique
column.
Things I have tried:
> exemplar %>% group_by(group) %>% mutate(unique = unique(select(., starts_with("c"), "extra")))
Adding missing grouping variables: `group`
Error: Problem with `mutate()` input `unique`.
x Input `unique` can/'t be recycled to size 5.
> exemplar %>%
+ mutate(unique = pmap_chr(
+ .l = select(., starts_with("c"), "extra"),
+ .f = function(...) unique(...)
+ )
+ )
Error: Problem with `mutate()` input `unique`.
x argument "x" is missing, with no default
> exemplar %>%
+ mutate(unique = pmap_chr(
+ .l = select(., starts_with("c"), "extra"),
+ .f = function(...) unique(...)
+ )
+ )
Error: Problem with `mutate()` input `unique`.
x argument "x" is missing, with no default
I feel like I've completely missed a simple way to do this - should I be unlisting the 'extra' column earlier? Given the variable number of columns to include, should I be creating a selection vector to use instead of using select(., ...) to do it in-place? Do I need to do it in a few steps then discard the unneeded columns?
Upvotes: 2
Views: 1422
Reputation: 3429
I think the following code might do what you ask. The trick is to combine the values from different columns, some of them containing characters, and the other one containing "lists". So the first steps are to extract the information from both types of columns as vectors (with c_across(starts_with("c"))
and unlist(extra)
) ads combine them into a vector on which you will be able to work.
exemplar %>%
group_by(group) %>%
mutate(unique = list( # Makes sure that the new column is a "list"
unique( # Get the "unique" values
c( # Combine results from two types of columns
c_across( # First extract the "char" columns into a vector
starts_with("c")),
unlist(extra)) # Then extract the "extra" column into a vector
)
)) %>%
ungroup()
The result of this command is the following
# A tibble: 10 × 6
group char1 char2 char3 extra unique
<chr> <chr> <chr> <chr> <list> <list>
1 group1 a b c <chr [1]> <chr [8]>
2 group1 b c d <chr [1]> <chr [8]>
3 group1 c d e <chr [1]> <chr [8]>
4 group1 d e f <chr [1]> <chr [8]>
5 group1 e f g <chr [1]> <chr [8]>
6 group2 f g h <list [2]> <chr [10]>
7 group2 g h i <chr [1]> <chr [10]>
8 group2 h i j <chr [1]> <chr [10]>
9 group2 i j k <chr [1]> <chr [10]>
10 group2 j k l <chr [1]> <chr [10]>
For group 1, the result is
[[1]]
[1] "a" "b" "c" "d" "e" "f" "g" ""
And for group 1,
[[1]]
[1] "f" "g" "h" "i" "j" "k" "l" "x" "y" ""
Upvotes: 1