James Martherus
James Martherus

Reputation: 1043

How to vectorize the RHS of dplyr::case_when?

Suppose I have a dataframe that looks like this:

> data <- data.frame(x = c(1,1,2,2,3,4,5,6), y = c(1,2,3,4,5,6,7,8))
> data
  x y
1 1 1
2 1 2
3 2 3
4 2 4
5 3 5
6 4 6
7 5 7
8 6 8

I want to use mutate and case_when to create a new id variable that will identify rows using the variable x, and give rows missing x a unique id. In other words, I should have the same id for rows one and two, rows three and four, while rows 5-8 should have their own unique ids. Suppose I want to generate these id values with a function:

id_function <- function(x, n){
  set.seed(x)
  res <- character(n)
  for(i in seq(n)){
    res[i] <- paste0(sample(c(letters, LETTERS, 0:9), 32), collapse="")
  }
  res
}

id_function(1, 1)
[1] "4dMaHwQnrYGu0PTjgioXKOyW75NRZtcf"

I am trying to use this function on the RHS of a case_when expression like this:

data %>%
  mutate(my_id = id_function(1234, nrow(.)),
         my_id = dplyr::case_when(!is.na(x) ~ id_function(x, 1),
                                  TRUE ~ my_id))

But the RHS does not seem to be vectorized and I get the same value for all non-missing values of x:

   x y                            my_id
1  1 1 4dMaHwQnrYGu0PTjgioXKOyW75NRZtcf
2  1 2 4dMaHwQnrYGu0PTjgioXKOyW75NRZtcf
3  2 3 4dMaHwQnrYGu0PTjgioXKOyW75NRZtcf
4  2 4 4dMaHwQnrYGu0PTjgioXKOyW75NRZtcf
5 NA 5 0vnws5giVNIzp86BHKuOZ9ch4dtL3Fqy
6 NA 6 IbKU6DjvW9ypitl7qc25Lr4sOwEfghdk
7 NA 7 8oqQMPx6IrkGhXv4KlUtYfcJ5Z1RCaDy
8 NA 8 BRsjumlCEGS6v4ANrw1bxLynOKkF90ao

I'm sure there's a way to vectorize the RHS, what am I doing wrong? Is there an easier approach to solving this problem?

Upvotes: 2

Views: 111

Answers (2)

Ben
Ben

Reputation: 30549

purrr map functions can be used for non-vectorized functions. The following will give you a similar result. map2 will take the two arguments expected by your id_function.

library(tidyverse)

data %>%
  mutate(my_id = map2(x, 1, id_function))

Output

  x y                            my_id
1 1 1 4dMaHwQnrYGu0PTjgioXKOyW75NRZtcf
2 1 2 4dMaHwQnrYGu0PTjgioXKOyW75NRZtcf
3 2 3 uof7FhqC3lOXkacp54MGZJLUR6siSKDb
4 2 4 uof7FhqC3lOXkacp54MGZJLUR6siSKDb
5 3 5 e5lMJNQEhtj4VY1KbCR9WUiPrpy7vfXo
6 4 6 3kYcgR7109DLbxatQIAKXFeovN8pnuUV
7 5 7 bQ4ok7OuDgscLUlpzKAivBj2T3m6wrWy
8 6 8 0jSn3Jcb2HDA5uhvG8g1ytsmRpl6CQWN

Upvotes: 1

Arthur Welle
Arthur Welle

Reputation: 708

I guess rowwise() would do the trick:

data %>%
  rowwise() %>% 
  mutate(my_id = id_function(x, 1))

x   y   my_id

1   1   4dMaHwQnrYGu0PTjgioXKOyW75NRZtcf        
1   2   4dMaHwQnrYGu0PTjgioXKOyW75NRZtcf        
2   3   uof7FhqC3lOXkacp54MGZJLUR6siSKDb        
2   4   uof7FhqC3lOXkacp54MGZJLUR6siSKDb        
3   5   e5lMJNQEhtj4VY1KbCR9WUiPrpy7vfXo        
4   6   3kYcgR7109DLbxatQIAKXFeovN8pnuUV        
5   7   bQ4ok7OuDgscLUlpzKAivBj2T3m6wrWy        
6   8   0jSn3Jcb2HDA5uhvG8g1ytsmRpl6CQWN

Upvotes: 2

Related Questions