shiny
shiny

Reputation: 3502

Calculate the mean of values assigned to each quantile in different quantile types?

I want to compare the 9 types of quantiles.

I calculated the the quantiles for variable a in a data.frame. For each type (1-9), I calculated 10 quantiles (with 1 as the highest 10% and 10 as the lowest 10%).

set.seed(123)
library(dplyr)
a <- as.numeric(sample(1.1e6:87e6, 366, replace=T))
b <- runif(366, 0.005, 2.3)
df<- data.frame(a,b)
df <- df %>% 
      mutate(type1 = 11 - as.integer(cut(a, quantile(a, probs=0:10/10, type = 1), include.lowest=TRUE)),  
             type2 = 11 - as.integer(cut(a, quantile(a, probs=0:10/10, type = 2), include.lowest=TRUE)),
             type3 = 11 - as.integer(cut(a, quantile(a, probs=0:10/10, type = 3), include.lowest=TRUE)),
             type4 = 11 - as.integer(cut(a, quantile(a, probs=0:10/10, type = 4), include.lowest=TRUE)),
             type5 = 11 - as.integer(cut(a, quantile(a, probs=0:10/10, type = 5), include.lowest=TRUE)),
             type6 = 11 - as.integer(cut(a, quantile(a, probs=0:10/10, type = 6), include.lowest=TRUE)),
             type7 = 11 - as.integer(cut(a, quantile(a, probs=0:10/10, type = 7), include.lowest=TRUE)),
             type8 = 11 - as.integer(cut(a, quantile(a, probs=0:10/10, type = 8), include.lowest=TRUE)),
             type9 = 11 - as.integer(cut(a, quantile(a, probs=0:10/10, type = 9), include.lowest=TRUE)))

I want to calculate the mean of a in each of the 10th quantiles for 9 types. I should have 90 values of mean of a.
How can I do that?

Upvotes: 0

Views: 351

Answers (2)

alistaire
alistaire

Reputation: 43344

Continuing with dplyr, you can use lapply to loop over the quantile columns, group_by_ one at a time, and summarise to calculate grouped means. do.call(cbind ... catches columns of means and turns them into a new data.frame.

means_a <- do.call(cbind, lapply(names(df)[3:11], function(x){group_by_(df, x) %>%
    summarise(m = mean(a)) %>% select(m)}))
# clean up names
names(means_a) <- names(df)[3:11]

You're left with

> means_a
      type1    type2    type3    type4    type5    type6    type7    type8    type9
1  82835646 82835646 82704531 82704531 82704531 82835646 82704531 82835646 82835646
2  73922430 73922430 73809597 73674619 73809597 73922430 73809597 73922430 73922430
3  64571479 64571479 64449537 64328263 64449537 64449537 64449537 64449537 64449537
4  56421583 56421583 56320527 56207920 56320527 56320527 56320527 56320527 56320527
5  47065506 47065506 47065506 46924157 47065506 47065506 47065506 47065506 47065506
6  38559879 38559879 38468169 38468169 38468169 38468169 38559879 38468169 38468169
7  31639898 31639898 31541934 31442833 31541934 31541934 31639898 31541934 31541934
8  23589748 23589748 23495235 23373569 23495235 23495235 23589748 23495235 23495235
9  15766101 15766101 15645916 15535787 15645916 15535787 15766101 15535787 15645916
10  6637675  6637675  6637675  6500634  6637675  6500634  6637675  6500634  6637675

Upvotes: 1

Gopala
Gopala

Reputation: 10483

This is one approach that produces the desired 90 means:

f <- function(type, x) {return(11 - as.integer(cut(x, quantile(x, probs=0:10/10, type = type), include.lowest=TRUE)))}

set.seed(123)
a <- as.numeric(sample(1.1e6:87e6, 366, replace=T))
b <- runif(366, 0.005, 2.3)
df<- data.frame(a,b)
df <- cbind(df, data.frame(sapply(seq(1:9), f, x = df$a)))
sapply(df[, 3:11], function(x) tapply(df$a, x, mean))
             X1       X2       X3       X4       X5       X6       X7       X8       X9
1  82835646 82835646 82704531 82704531 82704531 82835646 82704531 82835646 82835646
2  73922430 73922430 73809597 73674619 73809597 73922430 73809597 73922430 73922430
3  64571479 64571479 64449537 64328263 64449537 64449537 64449537 64449537 64449537
4  56421583 56421583 56320527 56207920 56320527 56320527 56320527 56320527 56320527
5  47065506 47065506 47065506 46924157 47065506 47065506 47065506 47065506 47065506
6  38559879 38559879 38468169 38468169 38468169 38468169 38559879 38468169 38468169
7  31639898 31639898 31541934 31442833 31541934 31541934 31639898 31541934 31541934
8  23589748 23589748 23495235 23373569 23495235 23495235 23589748 23495235 23495235
9  15766101 15766101 15645916 15535787 15645916 15535787 15766101 15535787 15645916
10  6637675  6637675  6637675  6500634  6637675  6500634  6637675  6500634  6637675

NOTE: Adding the missing function.

Upvotes: 1

Related Questions