Reputation: 329
Suppose I have a dataframe:
value = c(1:5,16:20, 26:30)
group = c(rep("A", 5), rep("B", 5), rep("C", 5))
df = data.frame(value, group)
I would like to create a new dataframe that would contain top_n values for each group such that n for A group = 3, n for B group = 2 and n for C group = 1.
# new dataframe should look like this:
value group
1 5 A
2 4 A
3 3 A
4 20 B
5 19 B
6 30 C
I think I should map top_n function to my data, but I am struggling to find the right implementation.
Upvotes: 2
Views: 324
Reputation: 13125
Using map
and top_n
library(tidyverse)
df %>% nest(-group) %>%
mutate(dt = map(data, ~top_n(.x, n=.x$n[1], wt=value))) %>%
unnest(dt)
#Using map_df
map_df(df %>% group_split(group), ~top_n(.x, n=.x$n[1], wt=value))
# A tibble: 6 x 3
value group n
<int> <chr> <dbl>
1 3 A 3
2 4 A 3
3 5 A 3
4 19 B 2
5 20 B 2
6 30 C 1
Data
value = c(1:5,16:20, 26:30)
group = c(rep("A", 5), rep("B", 5), rep("C", 5))
n = c(rep(3, 5), rep(2, 5), rep(1, 5))
df = data.frame(value, group,n,stringsAsFactors = FALSE)
Upvotes: 3
Reputation: 72613
You could use tail
in a Map
call.
do.call(rbind, Map(tail, split(df, df$group), 3:1))
# value group
# A.3 3 A
# A.4 4 A
# A.5 5 A
# B.9 19 B
# B.10 20 B
# C 30 C
Note: sort beforehand, if the data is not as nicely sorted as in the given example, e.g. df <- with(df, df[order(group, value), ])
.
Data
df <- structure(list(value = c(1L, 2L, 3L, 4L, 5L, 16L, 17L, 18L, 19L,
20L, 26L, 27L, 28L, 29L, 30L), group = structure(c(1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("A",
"B", "C"), class = "factor")), class = "data.frame", row.names = c(NA,
-15L))
Upvotes: 5
Reputation: 37879
You can use an one-liner with base R. I think it might be more complicated with dplyr here.
#split the df on group and then subset each group
mylist <- Map(function(x, y) x[order(x$value, decreasing = TRUE)[1:y], ], split(df, group), 3:1)
do.call(rbind, mylist)
# value group
#1 5 A
#2 4 A
#3 3 A
#4 20 B
#5 19 B
#6 30 C
Since you are using dplyr already you can also use bind_rows
as well:
bind_rows(Map(function(x, y) x[order(x$value, decreasing = TRUE)[1:y], ], split(df, group), 3:1))
Upvotes: 2
Reputation: 388817
I would prefer to add n
in the dataframe, then arrange
and slice
library(dplyr)
df %>%
mutate(n = case_when(group == "A"~3L,
group == "B"~ 2L,
TRUE ~ 1L)) %>%
arrange(group, desc(value)) %>%
group_by(group) %>%
slice(seq_len(n[1L])) %>%
select(-n)
# value group
# <int> <fct>
#1 5 A
#2 4 A
#3 3 A
#4 20 B
#5 19 B
#6 30 C
Upvotes: 3
Reputation: 5109
Here's the implementation with {dplyr}
>= 0.8 & {purrr}
:
value = c(1:5,16:20, 26:30)
group = c(rep("A", 5), rep("B", 5), rep("C", 5))
df = data.frame(value, group)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
df %>%
group_by(group) %>%
group_split() %>%
map2_df(., length(.):1, ~ top_n(.x, .y, value) %>% arrange(desc(value)))
#> # A tibble: 6 x 2
#> value group
#> <int> <fct>
#> 1 5 A
#> 2 4 A
#> 3 3 A
#> 4 20 B
#> 5 19 B
#> 6 30 C
Note that top_n doesn't order the data so you have to combine top_n()
and arrange()
.
Another suggestion in base R:
x <- df %>%
split(df$group)
mapply(function(x, y){
top_n(x, y, value)
}, x = x, y = length(x):1, SIMPLIFY = FALSE) %>%
do.call(rbind, .)
value group
A.1 3 A
A.2 4 A
A.3 5 A
B.1 19 B
B.2 20 B
C 30 C
Upvotes: 2