Reputation: 329

Map top_n function to grouped data

Suppose I have a dataframe:

value = c(1:5,16:20, 26:30)
group = c(rep("A", 5), rep("B", 5), rep("C", 5))
df = data.frame(value, group)

I would like to create a new dataframe that would contain top_n values for each group such that n for A group = 3, n for B group = 2 and n for C group = 1.

# new dataframe should look like this:

  value group
1     5     A
2     4     A
3     3     A
4    20     B
5    19     B
6    30     C

I think I should map top_n function to my data, but I am struggling to find the right implementation.

Upvotes: 2

Answers (5)

A. Suliman

Reputation: 13125

Using map and top_n

library(tidyverse)
df %>% nest(-group) %>% 
       mutate(dt = map(data, ~top_n(.x, n=.x$n[1], wt=value))) %>% 
       unnest(dt)

#Using map_df
map_df(df %>% group_split(group), ~top_n(.x, n=.x$n[1], wt=value))

# A tibble: 6 x 3
  value group     n
  <int> <chr> <dbl>
  1     3 A         3
  2     4 A         3
  3     5 A         3
  4    19 B         2
  5    20 B         2
  6    30 C         1

Data

value = c(1:5,16:20, 26:30)
group = c(rep("A", 5), rep("B", 5), rep("C", 5))
n = c(rep(3, 5), rep(2, 5), rep(1, 5))
df = data.frame(value, group,n,stringsAsFactors = FALSE)

Upvotes: 3

jay.sf

Reputation: 72613

You could use tail in a Map call.

do.call(rbind, Map(tail, split(df, df$group), 3:1))
#      value group
# A.3      3     A
# A.4      4     A
# A.5      5     A
# B.9     19     B
# B.10    20     B
# C       30     C

Note: sort beforehand, if the data is not as nicely sorted as in the given example, e.g. df <- with(df, df[order(group, value), ]).

Data

df <- structure(list(value = c(1L, 2L, 3L, 4L, 5L, 16L, 17L, 18L, 19L, 
20L, 26L, 27L, 28L, 29L, 30L), group = structure(c(1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", 
"B", "C"), class = "factor")), class = "data.frame", row.names = c(NA, 
-15L))

Upvotes: 5

LyzandeR

Reputation: 37879

You can use an one-liner with base R. I think it might be more complicated with dplyr here.

#split the df on group and then subset each group
mylist <- Map(function(x, y) x[order(x$value, decreasing = TRUE)[1:y], ], split(df, group), 3:1)
do.call(rbind, mylist)

#  value group
#1     5     A
#2     4     A
#3     3     A
#4    20     B
#5    19     B
#6    30     C

Since you are using dplyr already you can also use bind_rows as well:

bind_rows(Map(function(x, y) x[order(x$value, decreasing = TRUE)[1:y], ], split(df, group), 3:1))

Upvotes: 2

Ronak Shah

Reputation: 388817

I would prefer to add n in the dataframe, then arrange and slice

library(dplyr)

df %>%
   mutate(n = case_when(group == "A"~3L, 
                        group == "B"~ 2L, 
                        TRUE ~ 1L)) %>%
   arrange(group, desc(value)) %>%
   group_by(group) %>%
   slice(seq_len(n[1L])) %>%
   select(-n)


#  value group
#  <int> <fct>
#1     5 A    
#2     4 A    
#3     3 A    
#4    20 B    
#5    19 B    
#6    30 C

Upvotes: 3

Colin FAY

Reputation: 5109

Here's the implementation with {dplyr} >= 0.8 & {purrr}:

value = c(1:5,16:20, 26:30)
group = c(rep("A", 5), rep("B", 5), rep("C", 5))
df = data.frame(value, group)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)
df %>% 
  group_by(group) %>%
  group_split() %>%
  map2_df(., length(.):1, ~ top_n(.x, .y, value) %>% arrange(desc(value)))
#> # A tibble: 6 x 2
#>   value group
#>   <int> <fct>
#> 1     5 A    
#> 2     4 A    
#> 3     3 A    
#> 4    20 B    
#> 5    19 B    
#> 6    30 C

Note that top_n doesn't order the data so you have to combine top_n() and arrange().

Another suggestion in base R:

x <- df %>%
  split(df$group)
mapply(function(x, y){
  top_n(x, y, value)
}, x = x, y = length(x):1, SIMPLIFY = FALSE) %>%
  do.call(rbind, .)
    value group
A.1     3     A
A.2     4     A
A.3     5     A
B.1    19     B
B.2    20     B
C      30     C

Upvotes: 2

Map top_n function to grouped data

Answers (5)

Related Questions