Reputation: 1878
Given a grouped tbl, can I extract one/few groups? Such function can be useful when prototyping code, e.g.:
mtcars %>%
group_by(cyl) %>%
select_first_n_groups(2) %>%
do({'complicated expression'})
Surely, one can do an explicit filter before grouping, but that can be cumbersome.
Upvotes: 17
Views: 9069
Reputation: 3499
I know this is an old question, but I was looking for something similar, and then came across this question but then realised this is now much easier since dplyr 1.0, and thought others might also be looking.
You can simple group and filter based on cur_group_id()
. If you know the grouping that you are after you could also use cur_group()
although arguably might be just as easy to filter on what you want. I can imagine these being useful in combination if you have a heavily grouped data frame and just want the first group with a confirmed match in a category or 2. Would need to be pedantic about what the group order is though in my current example.
library(dplyr)
starwars %>% group_by(homeworld, species) %>% filter(cur_group_id() == 1)
#> # A tibble: 3 x 14
#> # Groups: homeworld, species [1]
#> name height mass hair_color skin_color eye_color birth_year sex gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 Leia~ 150 49 brown light brown 19 fema~ femin~
#> 2 Bail~ 191 NA black tan brown 67 male mascu~
#> 3 Raym~ 188 79 brown light brown NA male mascu~
#> # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> # vehicles <list>, starships <list>
starwars %>% group_by(homeworld, species, eye_color) %>%
filter(grepl("Tatooine Human",paste(cur_group(), collapse = " ") )) %>%
filter(cur_group_id() == 1)
#> # A tibble: 5 x 14
#> # Groups: homeworld, species, eye_color [1]
#> name height mass hair_color skin_color eye_color birth_year sex gender
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 Luke~ 172 77 blond fair blue 19 male mascu~
#> 2 Owen~ 178 120 brown, gr~ light blue 52 male mascu~
#> 3 Beru~ 165 75 brown light blue 47 fema~ femin~
#> 4 Anak~ 188 84 blond fair blue 41.9 male mascu~
#> 5 Clie~ 183 NA brown fair blue 82 male mascu~
#> # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> # vehicles <list>, starships <list>
Created on 2023-08-11 by the reprex package (v0.3.0)
Upvotes: 2
Reputation: 11192
With a bit of dplyr
along with some nesting/unnesting (supported by tidyr
package), you could establish a small helper to get the first (or any) group
first = function(x) x %>% nest %>% ungroup %>% slice(1) %>% unnest(data)
mtcars %>% group_by(cyl) %>% first()
By adjusting the slicing you could also extract the nth or any range of groups by index, but typically the first or the last is what most users want.
The name is inspired by functional APIs which all call it first
(see stdlibs of i.e. kotlin, python, scala, java, spark).
Edit: Faster Version
A more scalable version (>50x faster on large datasets) that avoids nesting would be
first_group = function(x) x %>%
select(group_cols()) %>%
distinct %>%
ungroup %>%
slice(1) %>%
{ semi_join(x, .)}
A another positive side-effect of this improved version is that it fails if not grouping is present in x
.
Upvotes: 15
Reputation: 269481
Try this where groups
is a vector of group numbers. Here 1:2
means the first two groups:
select_groups <- function(data, groups, ...)
data[sort(unlist(attr(data, "indices")[ groups ])) + 1, ]
mtcars %>% group_by(cyl) %>% select_groups(1:2)
The selected rows appear in the original order. If you prefer that the rows appear in the order that the groups are specified (e.g. in the above eaxmple the rows of the first group followed by the rows of the second group) then remove the sort
.
Upvotes: 9