Reputation: 1587
I am teaching myself the R tidyverse purr()
package and am having trouble implementing map()
on a column of nested data frames. Could someone explain what I'm missing?
Using the base R ChickWeight dataset as an example I can easily get the number of observations for each timepoint under diet #1 if I first filter for diet #1 like so:
library(tidyverse)
ChickWeight %>%
filter(Diet == 1) %>%
group_by(Time) %>%
summarise(counts = n_distinct(Chick))
This is great but I would like to do it for each diet at once and I thought nesting the data and iterating over it with map()
would be a good approach.
This is what I did:
example <- ChickWeight %>%
nest(-Diet)
Implementing this map function then achieves what I'm aiming for:
map(example$data, ~ .x %>% group_by(Time) %>% summarise(counts = n_distinct(Chick)))
However when I try and implement this same command using a pipe to put it in another column of the original data frame it fails.
example %>%
mutate(counts = map(data, ~ .x %>% group_by(Time) %>% summarise(counts = n_distinct(Chick))))
Error in eval(substitute(expr), envir, enclos) :
variable 'Chick' not found
Why does this occur?
I also tried it on the data frame split into a list and it didn't work.
ChickWeight %>%
split(.$Diet) %>%
map(data, ~ .x %>% group_by(Time) %>% summarise(counts = n_distinct(Chick)))
Upvotes: 2
Views: 3141
Reputation: 43334
Because you're using dplyr non-standard evaluation inside of dplyr NSE, it's getting confused about what environment to search for Chick
. It's probably a bug, really, but it can be avoided with the development version's new .data
pronoun, which specifies where to look:
library(tidyverse)
ChickWeight %>%
nest(-Diet) %>%
mutate(counts = map(data,
~.x %>% group_by(Time) %>%
summarise(counts = n_distinct(.data$Chick))))
#> # A tibble: 4 × 3
#> Diet data counts
#> <fctr> <list> <list>
#> 1 1 <tibble [220 × 3]> <tibble [12 × 2]>
#> 2 2 <tibble [120 × 3]> <tibble [12 × 2]>
#> 3 3 <tibble [120 × 3]> <tibble [12 × 2]>
#> 4 4 <tibble [118 × 3]> <tibble [12 × 2]>
To pipe it through a list, leave the first parameter of map
blank to pass in the list over which to iterate:
ChickWeight %>%
split(.$Diet) %>%
map(~ .x %>% group_by(Time) %>% summarise(counts = n_distinct(Chick))) %>% .[[1]]
#> # A tibble: 12 × 2
#> Time counts
#> <dbl> <int>
#> 1 0 20
#> 2 2 20
#> 3 4 19
#> 4 6 19
#> 5 8 19
#> 6 10 19
#> 7 12 19
#> 8 14 18
#> 9 16 17
#> 10 18 17
#> 11 20 17
#> 12 21 16
A simpler option would be to just group by both columns:
ChickWeight %>% group_by(Diet, Time) %>% summarise(counts = n_distinct(Chick))
#> Source: local data frame [48 x 3]
#> Groups: Diet [?]
#>
#> Diet Time counts
#> <fctr> <dbl> <int>
#> 1 1 0 20
#> 2 1 2 20
#> 3 1 4 19
#> 4 1 6 19
#> 5 1 8 19
#> 6 1 10 19
#> 7 1 12 19
#> 8 1 14 18
#> 9 1 16 17
#> 10 1 18 17
#> # ... with 38 more rows
Upvotes: 6