Marco
Marco

Reputation: 2807

How to select groups from nested data in dplyr?

I have grouped data and nest() seems to be a create options to quickly summarize it. I would likt to select groups based on information of the tibble list, namely the number of observations. How can I access these numbers?

library(tidyr)
library(gapminder)

gapminder %>% 
  group_by(continent) %>%
  nest()

# A tibble: 5 x 2
  continent data              
  <fct>     <list>            
1 Asia      <tibble [396 x 5]>
2 Europe    <tibble [360 x 5]>
3 Africa    <tibble [624 x 5]>
4 Americas  <tibble [300 x 5]>
5 Oceania   <tibble [24 x 5]> 

# How to select continents with more than 350 observations?

I tried combinations of sample_n() but of course every element of the nested data has size 1. How can I access the information about number of observations which I can see so easily from my console output for further selections?

Upvotes: 0

Views: 258

Answers (2)

tomasu
tomasu

Reputation: 1438

Another possible solution (and extremely similar to r2evans) could be achieved using map_lgl() from library(purrr) like so:

library(tidyverse)
library(gapminder)

gapminder %>% 
  group_by(continent) %>%
  nest() %>% 
  filter(purrr::map_lgl(data, ~ nrow(.x) > 350))
#> # A tibble: 3 x 2
#> # Groups:   continent [5]
#>   continent data              
#>   <fct>     <list>            
#> 1 Asia      <tibble [396 x 5]>
#> 2 Europe    <tibble [360 x 5]>
#> 3 Africa    <tibble [624 x 5]>

Created on 2020-03-25 by the reprex package (v0.3.0)

Upvotes: 1

r2evans
r2evans

Reputation: 160447

Untested.

gapminder %>%
  group_by(continent) %>%
  nest() %>%
  mutate(n = sapply(data, NROW)) %>%
  filter(n > 300)

or just

gapminder %>%
  group_by(continent) %>%
  nest() %>%
  filter(sapply(data, NROW) > 300)

You can replace sapply with purrr::map_int.

Upvotes: 1

Related Questions