Kris Aldred
Kris Aldred

Reputation: 23

Summarise multiple Columns In R Based On Top 5 Values

I am Trying To Summarise Multiple Columns Based On The Top 5 Values Of Each Variable In R An Example Of The Data Is Below.

df

ID  A   B   C   D

A   325 68  8   8
B   308 85  2   7
B   342 99  6   2
A   439 83  9   6
A   278 60  10  2
A   367 78  14  4
C   136 59  12  5
C   259 73  11  4
B   338 79  5   6
B   461 99  3   7
D   364 73  14  4
D   238 80  3   8
A   266 54  10  10

My Current Code Looks Like This:

    df2 <- df %>% group_by(ID) %>% top_n(5, A) %>% summarise(ATop5 = mean(A))

The output in df2 displays the information which I need.

However I have multiple variables in the original data frame which I wish to run and appear in the same output as df2.

Currently I am producing a separate df for each variable and then combining into a single df via the ID column.

Missing this step would be of great help.

Upvotes: 2

Views: 287

Answers (2)

akrun
akrun

Reputation: 887088

An option with summarise_at

library(dplyr)
df %>%
   group_by(ID) %>%
   summarise_at(vars(A:D), ~ mean(tail(sort(.), 5)))

Upvotes: 1

GKi
GKi

Reputation: 39657

In base you can use aggregate with . ~ ID to apply a function over all remaining columns with groups.

aggregate(. ~ ID, df, function(x) mean(tail(sort(x),5)))
#  ID      A    B    C   D
#1  A 335.00 68.6 10.2 6.0
#2  B 362.25 90.5  4.0 5.5
#3  C 197.50 66.0 11.5 4.5
#4  D 301.00 76.5  8.5 6.0

Upvotes: 1

Related Questions