Reputation: 23
I am Trying To Summarise Multiple Columns Based On The Top 5 Values Of Each Variable In R An Example Of The Data Is Below.
df
ID A B C D
A 325 68 8 8
B 308 85 2 7
B 342 99 6 2
A 439 83 9 6
A 278 60 10 2
A 367 78 14 4
C 136 59 12 5
C 259 73 11 4
B 338 79 5 6
B 461 99 3 7
D 364 73 14 4
D 238 80 3 8
A 266 54 10 10
My Current Code Looks Like This:
df2 <- df %>% group_by(ID) %>% top_n(5, A) %>% summarise(ATop5 = mean(A))
The output in df2 displays the information which I need.
However I have multiple variables in the original data frame which I wish to run and appear in the same output as df2.
Currently I am producing a separate df for each variable and then combining into a single df via the ID column.
Missing this step would be of great help.
Upvotes: 2
Views: 287
Reputation: 887088
An option with summarise_at
library(dplyr)
df %>%
group_by(ID) %>%
summarise_at(vars(A:D), ~ mean(tail(sort(.), 5)))
Upvotes: 1
Reputation: 39657
In base you can use aggregate
with . ~ ID
to apply a function over all remaining columns with groups.
aggregate(. ~ ID, df, function(x) mean(tail(sort(x),5)))
# ID A B C D
#1 A 335.00 68.6 10.2 6.0
#2 B 362.25 90.5 4.0 5.5
#3 C 197.50 66.0 11.5 4.5
#4 D 301.00 76.5 8.5 6.0
Upvotes: 1