Reputation: 2807
I have grouped data and nest()
seems to be a create options to quickly summarize it. I would likt to select groups based on information of the tibble list, namely the number of observations. How can I access these numbers?
library(tidyr)
library(gapminder)
gapminder %>%
group_by(continent) %>%
nest()
# A tibble: 5 x 2
continent data
<fct> <list>
1 Asia <tibble [396 x 5]>
2 Europe <tibble [360 x 5]>
3 Africa <tibble [624 x 5]>
4 Americas <tibble [300 x 5]>
5 Oceania <tibble [24 x 5]>
# How to select continents with more than 350 observations?
I tried combinations of sample_n()
but of course every element of the nested data has size 1. How can I access the information about number of observations which I can see so easily from my console output for further selections?
Upvotes: 0
Views: 258
Reputation: 1438
Another possible solution (and extremely similar to r2evans) could be achieved using map_lgl()
from library(purrr)
like so:
library(tidyverse)
library(gapminder)
gapminder %>%
group_by(continent) %>%
nest() %>%
filter(purrr::map_lgl(data, ~ nrow(.x) > 350))
#> # A tibble: 3 x 2
#> # Groups: continent [5]
#> continent data
#> <fct> <list>
#> 1 Asia <tibble [396 x 5]>
#> 2 Europe <tibble [360 x 5]>
#> 3 Africa <tibble [624 x 5]>
Created on 2020-03-25 by the reprex package (v0.3.0)
Upvotes: 1
Reputation: 160447
Untested.
gapminder %>%
group_by(continent) %>%
nest() %>%
mutate(n = sapply(data, NROW)) %>%
filter(n > 300)
or just
gapminder %>%
group_by(continent) %>%
nest() %>%
filter(sapply(data, NROW) > 300)
You can replace sapply
with purrr::map_int
.
Upvotes: 1