Reputation: 121
I want to calculate the median of a frequency distribution for a large number of samples. Each of the samples have a number of classes (3 in the example) and their respective frequencies. Each of the classes is associated with a different value
data <- data.frame(sample=c(1,2,3,4,5),
freq_class1=c(1,1,59,10,2),
freq_class2=c(1,0,35,44,22),
freq_class3=c(0,4,1,9,2),
value_class1=c(12,11,14,11,13),
value_class2=c(27,33,34,31,29),
value_class3=c(75,78,88,81,65))
For example the median of sample 1 would be 19.5. I assume that this can be done using quantile()
on the frequency distribution of each sample, but all attempts failed.
Do you have any suggestion?
Upvotes: 2
Views: 2562
Reputation: 3866
This is probably not the most elegant way, but it works: basically, I'm recreating the full data vector from the information contained in the data.frame, then finding the median of that. Writing a function to do it lets me use apply
to quickly do it to each row of the data.frame.
find.median <- function(x) {
full.x <- rep(x[5:7],times=x[2:4])
return(median(full.x))
}
> apply(data,1,find.median)
[1] 19.5 78.0 14.0 31.0 29.0
Upvotes: 4