Reputation: 1878

How to extract one specific group in dplyr

Given a grouped tbl, can I extract one/few groups? Such function can be useful when prototyping code, e.g.:

mtcars %>%
  group_by(cyl) %>%
  select_first_n_groups(2) %>%
  do({'complicated expression'})

Surely, one can do an explicit filter before grouping, but that can be cumbersome.

Upvotes: 17

Answers (3)

Sarah

Reputation: 3499

I know this is an old question, but I was looking for something similar, and then came across this question but then realised this is now much easier since dplyr 1.0, and thought others might also be looking.

You can simple group and filter based on cur_group_id(). If you know the grouping that you are after you could also use cur_group() although arguably might be just as easy to filter on what you want. I can imagine these being useful in combination if you have a heavily grouped data frame and just want the first group with a confirmed match in a category or 2. Would need to be pedantic about what the group order is though in my current example.

library(dplyr)

 starwars %>% group_by(homeworld, species) %>% filter(cur_group_id() == 1)
#> # A tibble: 3 x 14
#> # Groups:   homeworld, species [1]
#>   name  height  mass hair_color skin_color eye_color birth_year sex   gender
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
#> 1 Leia~    150    49 brown      light      brown             19 fema~ femin~
#> 2 Bail~    191    NA black      tan        brown             67 male  mascu~
#> 3 Raym~    188    79 brown      light      brown             NA male  mascu~
#> # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> #   vehicles <list>, starships <list>

 starwars %>% group_by(homeworld, species, eye_color) %>% 
   filter(grepl("Tatooine Human",paste(cur_group(), collapse = " ") )) %>% 
   filter(cur_group_id() == 1)
#> # A tibble: 5 x 14
#> # Groups:   homeworld, species, eye_color [1]
#>   name  height  mass hair_color skin_color eye_color birth_year sex   gender
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
#> 1 Luke~    172    77 blond      fair       blue            19   male  mascu~
#> 2 Owen~    178   120 brown, gr~ light      blue            52   male  mascu~
#> 3 Beru~    165    75 brown      light      blue            47   fema~ femin~
#> 4 Anak~    188    84 blond      fair       blue            41.9 male  mascu~
#> 5 Clie~    183    NA brown      fair       blue            82   male  mascu~
#> # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> #   vehicles <list>, starships <list>

^{Created on 2023-08-11 by the reprex package (v0.3.0)}

Upvotes: 2

Holger Brandl

Reputation: 11192

With a bit of dplyr along with some nesting/unnesting (supported by tidyr package), you could establish a small helper to get the first (or any) group

first = function(x) x %>% nest %>% ungroup %>% slice(1) %>% unnest(data)
mtcars %>% group_by(cyl) %>% first()

By adjusting the slicing you could also extract the nth or any range of groups by index, but typically the first or the last is what most users want.

The name is inspired by functional APIs which all call it first (see stdlibs of i.e. kotlin, python, scala, java, spark).

Edit: Faster Version

A more scalable version (>50x faster on large datasets) that avoids nesting would be

first_group = function(x) x %>%
    select(group_cols()) %>%
    distinct %>%
    ungroup %>%
    slice(1) %>%
    { semi_join(x, .)}

A another positive side-effect of this improved version is that it fails if not grouping is present in x.

Upvotes: 15

G. Grothendieck

Reputation: 269481

Try this where groups is a vector of group numbers. Here 1:2 means the first two groups:

select_groups <- function(data, groups, ...) 
   data[sort(unlist(attr(data, "indices")[ groups ])) + 1, ]

mtcars %>% group_by(cyl) %>% select_groups(1:2)

The selected rows appear in the original order. If you prefer that the rows appear in the order that the groups are specified (e.g. in the above eaxmple the rows of the first group followed by the rows of the second group) then remove the sort.

Upvotes: 9

How to extract one specific group in dplyr

Answers (3)

Related Questions