Margot J
Margot J

Reputation: 33

Storing and calling variables in a column in dplyr within a function

I want to store some variables within a column cell within a tibble. I then want to call that column and either paste the names of those variables or call that column and paste the columns which those variables correspond to together. In addition, all of this occurs within a function and this is the only piece of hard coding left so I'd really like to find a way to solve this.

library("tidyverse") 
myData<-tibble("c1"=c("a","b","c"),
"c2"=c("1","2","3"),
"c3"=c("A","B","C"),
factors=c(list(c("c1","c2")),list(c("c2","c3")),list(c("c1","c2","c3"))))

myData%>%mutate(factors1=interaction(!!!quos(factors),sep=":",lex.order=TRUE))
# A tibble: 3 x 5
  c1    c2    c3    factors   factors1
  <chr> <chr> <chr> <list>    <fct>   
1 a     1     A     <chr [2]> c1:c2:c1
2 b     2     B     <chr [2]> c2:c3:c2
3 c     3     C     <chr [3]> c1:c2:c3

So this allows me to concatenate the names of the variables but as you can see, if one list is longer than the others, it loops.

For the second problem in which I would like to use the $factors column to specifically call the values of other columns, I can hardcode this like so:

myData%>%
mutate(factors2=interaction(!!!syms(c("c1","c2")),sep=":",lex.order=TRUE))
# A tibble: 3 x 5
 c1    c2    c3    factors   factors2
 <chr> <chr> <chr> <list>    <fct>   
1 a     1     A     <chr [2]> a:1     
2 b     2     B     <chr [2]> b:2     
3 c     3     C     <chr [3]> c:3  

However if I try this:

myData%>%
mutate(factors2=interaction(!!!syms(factors),sep=":",lex.order=TRUE))

Error in lapply(.x, .f, ...) : object 'factors' not found

The same happens if I try to unlist the factors or use other rlang expressions. I have also tried nesting rlang expressions but so far haven't found one that works as I intended.

I feel like this is something that should be possible but so far I haven't found a question on stack overflow or a tutorial that indicates that it is so maybe I'm on a wild goose chase. Thank you all for your time and help.

My code in full:

library("tidyverse") 

myData<-tibble("c1"=c("a","b","c"),
"c2"=c("1","2","3"),
"c3"=c("A","B","C"),
factors=c(list(c("c1","c2")),list(c("c2","c3")),list(c("c1","c2","c3"))))%>%
mutate(factors1=interaction(!!!quos(factors),sep=":",lex.order=TRUE))%>%
mutate(factors2=interaction(!!!syms(factors),sep=":",lex.order=TRUE))

My desired output is:

    # A tibble: 3 x 6
 c1    c2    c3    factors   factors1   factors2
 <chr> <chr> <chr> <list>     <fct>      <fct>   
1 a     1     A     <chr [2]> c1:c2       a:1     
2 b     2     B     <chr [2]> c2:c3       2:B     
3 c     3     C     <chr [3]> c1:c2:c3    c:3:C  

Upvotes: 3

Views: 772

Answers (2)

acylam
acylam

Reputation: 18681

Here is a method using map and imap:

library(tidyverse)

myData %>%
  mutate(factor1 = factors %>% map(~interaction(as.list(.), sep=':', lex.order = TRUE)) %>% unlist(),
         factor2 = factors %>% imap(~interaction(myData[.y, match(.x, names(myData))], sep=":", lex.order = TRUE)) %>% unlist())

For factor1, instead of splicing the arguments into dots, I pass a list into interaction.

For factor2, I match factors in each row with the names in myData and uses the column index (match(.x, names(myData))) in combination with the row index (.y from imap) to subset the appropriate elements to feed into interaction.

Both factor1 and factor2 require an unlist because map and imap returns lists.

Output:

# A tibble: 3 x 6
  c1    c2    c3    factors   factor1  factor2
  <chr> <chr> <chr> <list>    <fct>    <fct>  
1 a     1     A     <chr [2]> c1:c2    a:1    
2 b     2     B     <chr [2]> c2:c3    2:B    
3 c     3     C     <chr [3]> c1:c2:c3 c:3:C  

Upvotes: 1

Artem Sokolov
Artem Sokolov

Reputation: 13691

You first question can be addressed with purrr::map and purrr::lift families of functions:

myData %>%
  mutate( factors1 = map(factors, lift_dv(interaction, sep=":", lex.order=TRUE)) ) %>%
  mutate_at( "factors1", lift(fct_c) )
# # A tibble: 3 x 5
#   c1    c2    c3    factors   factors1
#   <chr> <chr> <chr> <list>    <fct>
# 1 a     1     A     <chr [2]> c1:c2
# 2 b     2     B     <chr [2]> c2:c3
# 3 c     3     C     <chr [3]> c1:c2:c3

The second question is more tricky, because !!! causes the evaluation of its argument immediately, which can sometimes lead to unintuitive operator precedence inside a dplyr chain. The cleanest way is to define a standalone function that composes your interaction expressions:

f <- function(fct) {expr( interaction(!!!syms(fct), sep=":", lex.order=TRUE) )}

# Example usage
f( myData$factors[[1]] )    # interaction(c1, c2, sep = ":", lex.order = TRUE)
f( myData$factors[[2]] )    # interaction(c2, c3, sep = ":", lex.order = TRUE)

myData %>% mutate( e = map(factors, f) )
# # A tibble: 3 x 5
#   c1    c2    c3    factors   e
#   <chr> <chr> <chr> <list>    <list>
# 1 a     1     A     <chr [2]> <language>
# 2 b     2     B     <chr [2]> <language>
# 3 c     3     C     <chr [3]> <language>

Unfortunately, we can't evaluate e directly, because it will feed the entire columns c1, c2, and c3 to the expressions, whereas you only want a single value that is in the same row as the expression. For this reason, we need to encapsulate columns c1 through c3 in a row-wise fashion.

X <- myData %>% mutate( e = map(factors, f) ) %>%
  rowwise() %>% mutate( d = list(data_frame(c1,c2,c3)) ) %>% ungroup()
# # A tibble: 3 x 6
#   c1    c2    c3    factors   e          d
#   <chr> <chr> <chr> <list>    <list>     <list>
# 1 a     1     A     <chr [2]> <language> <tibble [1 × 3]>
# 2 b     2     B     <chr [2]> <language> <tibble [1 × 3]>
# 3 c     3     C     <chr [3]> <language> <tibble [1 × 3]>

Now you have expressions in e that need to be applied to data in d, so it's just a simple map2 traversal from here. Putting everything together and cleaning up, we get:

myData %>%
  mutate( factors1 = map(factors, lift_dv(interaction, sep=":", lex.order=TRUE)) ) %>%
  mutate( e = map(factors, f) ) %>%
  rowwise() %>% mutate( d = list(data_frame(c1,c2,c3)) ) %>% ungroup() %>%
  mutate( factors2 = map2( e, d, rlang::eval_tidy ) ) %>%
  mutate_at( vars(factors1,factors2), lift(fct_c) ) %>%
  select( -e, -d )
# # A tibble: 3 x 6
#   c1    c2    c3    factors   factors1 factors2
#   <chr> <chr> <chr> <list>    <fct>    <fct>
# 1 a     1     A     <chr [2]> c1:c2    a:1
# 2 b     2     B     <chr [2]> c2:c3    2:B
# 3 c     3     C     <chr [3]> c1:c2:c3 c:3:C

Upvotes: 1

Related Questions