Reputation: 1183
It's pretty common to use %>%
operator in conjuction with .
as a representation of the left-hand-side (LHS) object of %>%
, for example:
library(purrr)
mtcars %>%
split(.$cyl) %>% # as you can see here
map(~ lm(mpg ~ hp, data = .x))
But using rsample::bootstraps()
function to create a tibble with a bootstrap-list-column, where each element have a dataset, I noticed an error using the .
pattern describe above that I don´t understand well.
library(purrr)
# create a 3 partitions
# inspect how many cyl == 4 are in each partition (ERROR)
rsample::bootstraps(mtcars, times = 3) %>%
map_dbl(.$splits,
function(x) {
cyl = as.data.frame(x)$cyl
mean(cyl == 4)
})
Error: Index 1 must have length 1, not 4
Run `rlang::last_error()` to see where the error occurred.
But instead if you store the output of rsample::bootstraps()
in the ex
object and then use map_dbl
, as you can see in the documentation, it works properly.
library(purrr)
# create 3 partitions
ex <- rsample::bootstraps(mtcars, times = 3)
# inspect how many cyl == 4 are in each partition (WORKS OK)
map_dbl(ex$splits,
function(x) {
cyl = as.data.frame(x)$cyl
mean(cyl == 4)
})
[1] 0.50000 0.28125 0.43750
Any idea to understand this behavior between procedures?
Upvotes: 1
Views: 126
Reputation: 206243
The problem is not really specific to rsample
. This is just how the %>%
from magrittr
works. Consider
mtcars %>%
mean(.$carb)
This also results in an error. Because what it's basically calling is
mean(mtcars, mtcars$carb)
By default the pipe will always place what you are piping in to the first parameter of the function. You can move it to a different parameter with .
alone, but since you are not doing that here, you are still getting the entire first object passed to the first parameter of the function along with an additional parameter of .$samples
but that doesn't match the signature of map_dbl
that you want to use. This works fine with
mtcars %>%
split(.$cyl)
because split()
expects the entire data.frame as the first parameter. The correct call to split is
split(mtcars, mtcars$cyl)
It's always been the case that if you don't want the first parameter to be filled for you, then you can pipe into a block instead {}
.
You can do
rsample::bootstraps(mtcars, times = 3) %>%
{map_dbl(.$splits,
function(x) {
cyl = as.data.frame(x)$cyl
mean(cyl == 4)
})}
Or you could pull
the column explicitly
rsample::bootstraps(mtcars, times = 3) %>%
dplyr::pull(splits) %>%
map_dbl(
function(x) {
cyl = as.data.frame(x)$cyl
mean(cyl == 4)
})
Upvotes: 3