Reputation: 2469
I have a data frame with 2 columns one with numeric values and one with a name. The name repeats itself but has different values each time.
Data <- data.frame(
Value = c(1:10),
Name = rep(LETTERS, each=4)[1:10])
I would like to write a function that takes the 3 highest numbers for each name and calculates mean and median (and in case there aren’t 3 values present throw an NA) and then take all the values for each name and calculate mean and median. My initial attempt looks something like this:
my.mean <- function (x,y){
top3.x <- ifelse(x > 3 , NA, x)
return(mean(top3.x), median(top3.x))
}
Any hints on how to improve this will be appreciated.
Upvotes: 1
Views: 3222
Reputation: 118799
Here's a data.table
solution, assuming that you don't have any other NAs in your data:
require(data.table) ## 1.9.2+
setDT(Data) ## convert to data.table
Data[order(Name, -Value)][, list(m1=mean(Value[1:3]), m2=median(Value[1:3])), by=Name]
# Name m1 m2
# 1: A 3 3
# 2: B 7 7
# 3: C NA NA
Upvotes: 2
Reputation: 887193
Using dplyr
library(dplyr)
myFun1 <- function(dat){
dat %>%
group_by(Name)%>%
arrange(desc(Value))%>%
mutate(n=n(), Value=ifelse(n<=3, NA_integer_, Value))%>%
summarize(Mean=mean(head(Value,3)), Median=median(head(Value,3)))
}
myFun1(Data)
#Source: local data frame [3 x 3]
# Name Mean Median
#1 A 3 3
#2 B 7 7
#3 C NA NA
Upvotes: 1
Reputation: 193527
I would probably recommend by
for this.
Something put together really quickly might look like this (if I understood your question correctly):
myFun <- function(indf) {
do.call(rbind, with(indf, by(Value, Name, FUN=function(x) {
Vals <- head(sort(x, decreasing=TRUE), 3)
if (length(Vals) < 3) {
c(Mean = NA, Median = NA)
} else {
c(Mean = mean(Vals), Median = median(Vals))
}
})))
}
myFun(Data)
# Mean Median
# A 3 3
# B 7 7
# C NA NA
Note that it is not a very useful function in this form because of how many parameters are hard-coded into the function. It's really only useful if your data is in the form you shared.
Upvotes: 2