Reputation: 1792
I have a very simple question about referencing data columns within a nested dataframe.
For a reproducible example, I'll nest mtcars
by the two values of variable am
:
library(tidyverse)
mtcars_nested <- mtcars %>%
group_by(am) %>%
nest()
mtcars_nested
which gives data that looks like this.
#> # A tibble: 2 x 2
#> # Groups: am [2]
#> am data
#> <dbl> <list>
#> 1 1 <tibble [13 × 10]>
#> 2 0 <tibble [19 × 10]>
If I now wanted to use purrr::map
to take the mean of mpg
for each level of am
I wonder why this doesn't work:
take_mean_mpg <- function(df){
mean(df[["data"]]$mpg)
}
map(mtcars_nested, take_mean_mpg)
Error in df[["data"]] : subscript out of bounds
Or maybe a simpler question is: How should I properly reference the mpg
column, once it's nested. I know that this doesn't work:
mtcars_nested[["data"]]$mpg
Upvotes: 0
Views: 1977
Reputation: 3188
dataframes (and tbls) are lists of columns, not lists of rows, so when you pass the whole tbl mtcars_nest
to map()
it is iterating over the columns not over the rows. You can use mutate
with your function, and map_dbl
so that your new columns is not a list column.
library(tidyverse)
mtcars_nested <- mtcars %>%
group_by(am) %>%
nest()
mtcars_nested
take_mean_mpg <- function(df){
mean(df$mpg)
}
mtcars_nested %>%
mutate(mean_mpg = map_dbl(.data[["data"]], take_mean_mpg))
The .data[["data"]]
argument to map_dbl()
gives it the data
list column from you dataframe to iterate over, rather than the entire dataframe. The .data
part of the argument has no relation to your column named "data", it is the rlang pronoun .data to reference your whole dataframe. [["data"]]
then retrieves the column named "data" from your dataframe. You use mutate because you are trying (I assumed, perhaps incorrectly) to add a column with the averages to the nested dataframe. mutate()
is used to add columns, so you add a column equal to the output of map()
(or map_dbl()
) with your function, which will return the list (or vector) of averages.
This can me a confusing concept. Although map()
is often used to iterate over the rows of a dataframe, it technically iterates over a list (see the documentation, where under the arguments it says:
.x A list or atomic vector.
It also returns a list or a vector. The good news is that columns are just lists of values, so you pass it the list (column) you want it to iterate over and assign it to the list (column) where you want it stored (this assignment happens with mutate()
).
Upvotes: 2
Reputation: 389325
You should pass mtcars_nested$data
in map
and take mean of mpg
column.
take_mean_mpg <- function(df){
mean(df$mpg)
}
purrr::map(mtcars_nested$data, take_mean_mpg)
#[[1]]
#[1] 24.39231
#[[2]]
#[1] 17.14737
Upvotes: 2