Problem using rsamples::bootstraps object with ".$" inside purrr::map

It's pretty common to use %>% operator in conjuction with . as a representation of the left-hand-side (LHS) object of %>%, for example:

library(purrr)
mtcars %>% 
  split(.$cyl) %>%    # as you can see here
  map(~ lm(mpg ~ hp, data = .x))

But using rsample::bootstraps() function to create a tibble with a bootstrap-list-column, where each element have a dataset, I noticed an error using the . pattern describe above that I don´t understand well.

library(purrr)
# create a 3 partitions

# inspect how many cyl == 4 are in each partition (ERROR)
rsample::bootstraps(mtcars, times = 3) %>%
map_dbl(.$splits,
        function(x) {
                     cyl = as.data.frame(x)$cyl
                     mean(cyl == 4)
                    })
Error: Index 1 must have length 1, not 4
Run `rlang::last_error()` to see where the error occurred.

But instead if you store the output of rsample::bootstraps() in the ex object and then use map_dbl, as you can see in the documentation, it works properly.

library(purrr)
# create 3 partitions
ex <- rsample::bootstraps(mtcars, times = 3)

# inspect how many cyl == 4 are in each partition (WORKS OK)
map_dbl(ex$splits,
        function(x) {
                     cyl = as.data.frame(x)$cyl
                     mean(cyl == 4)
                    })
 [1] 0.50000 0.28125 0.43750

Any idea to understand this behavior between procedures?

Upvotes: 1

Views: 126

Answers (1)

MrFlick
MrFlick

Reputation: 206243

The problem is not really specific to rsample. This is just how the %>% from magrittr works. Consider

mtcars %>% 
  mean(.$carb)

This also results in an error. Because what it's basically calling is

mean(mtcars, mtcars$carb)

By default the pipe will always place what you are piping in to the first parameter of the function. You can move it to a different parameter with . alone, but since you are not doing that here, you are still getting the entire first object passed to the first parameter of the function along with an additional parameter of .$samples but that doesn't match the signature of map_dbl that you want to use. This works fine with

mtcars %>% 
  split(.$cyl)

because split() expects the entire data.frame as the first parameter. The correct call to split is

split(mtcars, mtcars$cyl)

It's always been the case that if you don't want the first parameter to be filled for you, then you can pipe into a block instead {}.

You can do

rsample::bootstraps(mtcars, times = 3) %>%
{map_dbl(.$splits,
        function(x) {
                     cyl = as.data.frame(x)$cyl
                     mean(cyl == 4)
                    })}

Or you could pull the column explicitly

rsample::bootstraps(mtcars, times = 3) %>%
  dplyr::pull(splits) %>%
  map_dbl(
        function(x) {
                     cyl = as.data.frame(x)$cyl
                     mean(cyl == 4)
                    })

Upvotes: 3

Related Questions