Reputation: 4521
I'll use the built-in chickwts
data as an example.
Here's the data, there are 5 feed types.
> head(chickwts)
weight feed
1 179 horsebean
2 160 horsebean
3 136 horsebean
4 227 horsebean
5 217 horsebean
6 168 horsebean
> table(chickwts$feed)
casein horsebean linseed meatmeal soybean sunflower
12 10 12 11 14 12
What I want is the top rows by weight for every feed type. However, I need a different number for each feed type? For example,
top_n_feed <-
c(
"casein" = 3,
"horsebean" = 5,
"linseed" = 3,
"meatmeal" = 6,
"soybean" = 3,
"sunflower" = 2
)
How can I do this using dplyr
?
To get the top n
rows of each feed type by weight I can use code as below, but I'm not sure how to extend this to a different number for each feed type.
chickwts %>%
group_by(feed) %>%
slice_max(order_by = weight, n = 5)
Upvotes: 4
Views: 1245
Reputation: 17289
Another way using split
and map2
:
library(dplyr)
library(purrr)
chickwts %>%
filter(feed %in% names(top_n_feed)) %>%
split(.$feed) %>%
map2_dfr(top_n_feed[names(.)], ~slice_max(.x, order_by = weight, n = .y))
Upvotes: 1
Reputation: 1381
Any time you have a named list think purrr::imap
. Avoid joins if not required, particuarly when working at scale.
library(dplyr)
library(purrr)
top_n_feed <- c(
"casein" = 3,
"horsebean" = 5,
"linseed" = 3,
"meatmeal" = 6,
"soybean" = 3,
"sunflower" = 2
)
imap_dfr(top_n_feed, ~ filter(chickwts, feed %in% .y) %>%
slice_max(order_by = weight, n = .x))
weight feed
1 404 casein
2 390 casein
3 379 casein
4 227 horsebean
5 217 horsebean
6 179 horsebean
7 168 horsebean
8 160 horsebean
9 309 linseed
10 271 linseed
11 260 linseed
12 380 meatmeal
13 344 meatmeal
14 325 meatmeal
15 315 meatmeal
16 303 meatmeal
17 263 meatmeal
18 329 soybean
19 327 soybean
20 316 soybean
21 423 sunflower
22 392 sunflower
Upvotes: 2
Reputation: 388807
Bring top_n_feed
in chickwts
dataframe and select top n
rows for each group.
library(dplyr)
tibble::enframe(top_n_feed, name = 'feed') %>%
left_join(chickwts, by = 'feed') %>%
group_by(feed) %>%
top_n(first(value), weight)
# feed value weight
# <chr> <dbl> <dbl>
# 1 casein 3 390
# 2 casein 3 379
# 3 casein 3 404
# 4 horsebean 5 179
# 5 horsebean 5 160
# 6 horsebean 5 227
# 7 horsebean 5 217
# 8 horsebean 5 168
# 9 linseed 3 309
#10 linseed 3 260
# … with 12 more rows
For some reason I was not able to make slice_sample
work for this example.
Upvotes: 1
Reputation: 206167
This isn't really something that dplyr
names easy. I'd recommend merging in the data and then filtering.
tibble(feed=names(top_n_feed), topn=top_n_feed) %>%
inner_join(chickwts) %>%
group_by(feed) %>%
arrange(desc(weight), .by_group=TRUE) %>%
filter(row_number() <= topn) %>%
select(-topn)
Upvotes: 6