Reputation: 3502
I want to compare the 9 types of quantiles.
I calculated the the quantiles for variable a in a data.frame. For each type (1-9), I calculated 10 quantiles (with 1 as the highest 10% and 10 as the lowest 10%).
set.seed(123)
library(dplyr)
a <- as.numeric(sample(1.1e6:87e6, 366, replace=T))
b <- runif(366, 0.005, 2.3)
df<- data.frame(a,b)
df <- df %>%
mutate(type1 = 11 - as.integer(cut(a, quantile(a, probs=0:10/10, type = 1), include.lowest=TRUE)),
type2 = 11 - as.integer(cut(a, quantile(a, probs=0:10/10, type = 2), include.lowest=TRUE)),
type3 = 11 - as.integer(cut(a, quantile(a, probs=0:10/10, type = 3), include.lowest=TRUE)),
type4 = 11 - as.integer(cut(a, quantile(a, probs=0:10/10, type = 4), include.lowest=TRUE)),
type5 = 11 - as.integer(cut(a, quantile(a, probs=0:10/10, type = 5), include.lowest=TRUE)),
type6 = 11 - as.integer(cut(a, quantile(a, probs=0:10/10, type = 6), include.lowest=TRUE)),
type7 = 11 - as.integer(cut(a, quantile(a, probs=0:10/10, type = 7), include.lowest=TRUE)),
type8 = 11 - as.integer(cut(a, quantile(a, probs=0:10/10, type = 8), include.lowest=TRUE)),
type9 = 11 - as.integer(cut(a, quantile(a, probs=0:10/10, type = 9), include.lowest=TRUE)))
I want to calculate the mean of a
in each of the 10th quantiles for 9 types. I should have 90 values of mean of a
.
How can I do that?
Upvotes: 0
Views: 351
Reputation: 43344
Continuing with dplyr
, you can use lapply
to loop over the quantile columns, group_by_
one at a time, and summarise
to calculate grouped means. do.call(cbind ...
catches columns of means and turns them into a new data.frame
.
means_a <- do.call(cbind, lapply(names(df)[3:11], function(x){group_by_(df, x) %>%
summarise(m = mean(a)) %>% select(m)}))
# clean up names
names(means_a) <- names(df)[3:11]
You're left with
> means_a
type1 type2 type3 type4 type5 type6 type7 type8 type9
1 82835646 82835646 82704531 82704531 82704531 82835646 82704531 82835646 82835646
2 73922430 73922430 73809597 73674619 73809597 73922430 73809597 73922430 73922430
3 64571479 64571479 64449537 64328263 64449537 64449537 64449537 64449537 64449537
4 56421583 56421583 56320527 56207920 56320527 56320527 56320527 56320527 56320527
5 47065506 47065506 47065506 46924157 47065506 47065506 47065506 47065506 47065506
6 38559879 38559879 38468169 38468169 38468169 38468169 38559879 38468169 38468169
7 31639898 31639898 31541934 31442833 31541934 31541934 31639898 31541934 31541934
8 23589748 23589748 23495235 23373569 23495235 23495235 23589748 23495235 23495235
9 15766101 15766101 15645916 15535787 15645916 15535787 15766101 15535787 15645916
10 6637675 6637675 6637675 6500634 6637675 6500634 6637675 6500634 6637675
Upvotes: 1
Reputation: 10483
This is one approach that produces the desired 90 means:
f <- function(type, x) {return(11 - as.integer(cut(x, quantile(x, probs=0:10/10, type = type), include.lowest=TRUE)))}
set.seed(123)
a <- as.numeric(sample(1.1e6:87e6, 366, replace=T))
b <- runif(366, 0.005, 2.3)
df<- data.frame(a,b)
df <- cbind(df, data.frame(sapply(seq(1:9), f, x = df$a)))
sapply(df[, 3:11], function(x) tapply(df$a, x, mean))
X1 X2 X3 X4 X5 X6 X7 X8 X9
1 82835646 82835646 82704531 82704531 82704531 82835646 82704531 82835646 82835646
2 73922430 73922430 73809597 73674619 73809597 73922430 73809597 73922430 73922430
3 64571479 64571479 64449537 64328263 64449537 64449537 64449537 64449537 64449537
4 56421583 56421583 56320527 56207920 56320527 56320527 56320527 56320527 56320527
5 47065506 47065506 47065506 46924157 47065506 47065506 47065506 47065506 47065506
6 38559879 38559879 38468169 38468169 38468169 38468169 38559879 38468169 38468169
7 31639898 31639898 31541934 31442833 31541934 31541934 31639898 31541934 31541934
8 23589748 23589748 23495235 23373569 23495235 23495235 23589748 23495235 23495235
9 15766101 15766101 15645916 15535787 15645916 15535787 15766101 15535787 15645916
10 6637675 6637675 6637675 6500634 6637675 6500634 6637675 6500634 6637675
NOTE: Adding the missing function.
Upvotes: 1