Ryan C. Thompson
Ryan C. Thompson

Reputation: 42080

How can I easily get the mean, median ,quartiles, etc. given counts of each value in R?

Suppose I have a data frame with a column for values and another column for the number of times that value was observed:

x <- data.frame(value=c(1,2,3), count=c(4,2,1))
x
#   value count
# 1     1     4
# 2     2     2
# 3     3     1

I know that I can get the weighted mean of the data using weighted.mean and the weighted median using the weighted.median function provided by several packages (e.g. limma), but how can I get other weighted statistics on my data, such as 1st and 3rd quartiles, and maybe standard deviation? "Expanding" the data using rep is not an option because sum(x$count) is about 3 billion (the size of the human genome).

Upvotes: 4

Views: 5844

Answers (4)

Ryan C. Thompson
Ryan C. Thompson

Reputation: 42080

For completeness, I'll note that the S4Vectors package in Bioconductor provides an answer in the form of the "Rle" class, which lets you construct a run-length encoded vector that supports all the usual operations:

library(S4Vectors)
x <- data.frame(value=c(1,2,3), count=c(4,2,1))
y <- Rle(x$value, x$count)
mean(y)
median(y)
quantile(y)

Upvotes: 0

I Like to Code
I Like to Code

Reputation: 7251

To complete the answer by Prasad Chalasani, here is the code to complete the weighted median given a column for values and another column for the number of times that value was observed. Note that it uses the wtd.quantile function from the Hmisc package.

require(Hmisc)

x <- data.frame(value=c(1,2,3), count=c(4,2,1))
##   value count
## 1     1     4
## 2     2     2
## 3     3     1

wtd.quantile(x$value, x$count, probs = 0.5)
## 50% 
##   1 

Upvotes: 1

aL3xa
aL3xa

Reputation: 36110

Or try to back-transform it, and run the analysis the usual way:

dtf <- data.frame(value = 1:3, count = c(4, 2, 1))
x <- with(dtf, rep(value, count))
summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   1.000   1.571   2.000   3.000 
fivenum(x)
[1] 1 1 1 2 3

Upvotes: 1

Prasad Chalasani
Prasad Chalasani

Reputation: 20282

Have you tried these packages:

  1. Hmisc -- it has several weighted statistics, including weighted quantiles

  2. laeken -- it has weighted quantiles.

Upvotes: 7

Related Questions