Reputation: 815
I have a data frame with a set of values and a factor.
df <- as.data.frame(matrix(sample(0:10, 1*30, replace=TRUE), ncol=1))
colnames(df)[1] <- "values"
df$factor<- rep(c("Factor.A","Factor.B"), each = 15)
What I would like to do is calculate the 75th percentile of values within each group...
Percentile_75 <- aggregate(values ~ factor, function(x)
quantile(x,(0.75)), data = df)
...and see how many values are greater than each of these thresholds within df for each factor level. I can do this manually for each factor, but in reality I have far more factor levels, so I am guessing there is a neat (possibly dplyr) function that would be able to do this easily? Thank you in advance.
Upvotes: 0
Views: 943
Reputation: 1095
A data.table approach:
df <- as.data.frame(matrix(sample(0:10, 1*30, replace=TRUE), ncol=1))
colnames(df)[1] <- "values"
df$factor<- rep(c("Factor.A","Factor.B"), each = 15)
library(data.table)
df <- setDT(df)
df[,P_75 := quantile(values, probs = 0.75), by = factor][
values > P_75, .(unique(P_75),.N), by = factor
]
# factor V1 N
# 1: Factor.A 7.5 4
# 2: Factor.B 8.0 2
Upvotes: 1
Reputation: 47330
with dplyr
you can do this:
library(dplyr)
df %>%
group_by(factor) %>%
summarize(Percentile_75 = quantile(values,0.75),n_sup = sum(values > Percentile_75))
# # A tibble: 2 x 3
# factor Percentile_75 n_sup
# <chr> <dbl> <int>
# 1 Factor.A 8.5 4
# 2 Factor.B 8.5 4
Upvotes: 2