MYjx
MYjx

Reputation: 4407

Get Mutate Error When Applying Purrr::Map on Grouped Data

Hi I am trying to apply a very simple function by using purrr::map however i keep getting the error Error in mutate_impl(.data, dots) : Evaluation error: unused argument (.x[[i]]).

The codes are as below:

data = data.frame(name = c('A', 'B', 'C'), metric = c(0.29, 0.39,0.89))
get_sample_size = function(metric, threshold = 0.01){

  sample_size =  ceiling((1.96^2)*(metric*(1-metric))/(threshold^2))
  return(data.frame(sample_size))
}
data %>% group_by(name) %>% tidyr::nest() %>% 
  dplyr::mutate(result = purrr::map( .x = data, .f = get_sample_size,  metric = metric, threshold = 0.01 ))

Upvotes: 1

Views: 1901

Answers (2)

camille
camille

Reputation: 16832

When you pass metric in the ... part of map, it's not clear that that is a column in the nested data frame. But once you nest the data like you've done, metric isn't a column in data, it's a column in the nested frame...also called "data." (This is a good example of why you want more specific variable names btw.)

If you're mapping over the data column, you can use $metric to point to that column, either in writing out a function, as I've done here (such as df$metric), or in formula notation (such as .$metric).

As @www said, you don't need nested data frames in this case. But for a more complicated case, you might need nested data frames to work with, such as for building models, so it's good to know how to reference exactly the data you want.

library(tidyverse)

data %>% 
  group_by(name) %>% 
  tidyr::nest() %>%
  mutate(result = map(data, function(df) {
    get_sample_size(metric = df$metric, threshold = 0.01)
  }))
#> # A tibble: 3 x 3
#>   name  data             result              
#>   <fct> <list>           <list>              
#> 1 A     <tibble [1 × 1]> <data.frame [1 × 1]>
#> 2 B     <tibble [1 × 1]> <data.frame [1 × 1]>
#> 3 C     <tibble [1 × 1]> <data.frame [1 × 1]>

Created on 2019-01-16 by the reprex package (v0.2.1)

Upvotes: 1

www
www

Reputation: 39154

You don't need nest. The metric argument from get_sample_size function should be a numeric vector, but if you do nest, the data column is a list of data frame, which cannot be the input for the metric argument.

I think you can use summarize and map to apply your function to the metric column.

library(tidyverse)

data %>% 
  group_by(name) %>% 
  summarize(result = purrr::map(.x = metric, 
                                .f = get_sample_size,  
                                threshold = 0.01))
# # A tibble: 3 x 2
#   name  result              
#   <fct> <list>              
# 1 A     <data.frame [1 x 1]>
# 2 B     <data.frame [1 x 1]>
# 3 C     <data.frame [1 x 1]>

Upvotes: 1

Related Questions