Nottles82
Nottles82

Reputation: 103

Produce quantiles based on levels of another variable

I have a dataset with employers and employees. each employee has a salary assigned. Using the aggregate function I have been able to aggreagte gross salaries by employer to obtain a single point estimate for gross salary in each employer. Now, I would like to show the distribution of earnings in each employer and thus want to make percentiles.

I've written this code, which produces the percentiles for the overall data. I would like the percentiles for each individual employer

pct <- quantile(salary, c(0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1)

I've then tried aggregate again like so, but R doesn't like this

aggregate (pct, by = list(employer), FUN=length)

To be honest, I don't know what FUN to assign here. I just chose length.

I've read the results for this question Quantiles by factor levels in R but the programming is beyond my understanding

Thanks

Upvotes: 0

Views: 874

Answers (1)

josliber
josliber

Reputation: 44299

You can compute your quantiles with the tapply function:

# Making sample data...
set.seed(144)
dat <- data.frame(employer=c(rep("A", 100), rep("B", 100)),
                  salary=rnorm(200))

# Compute salary quantiles for each employer
tapply(dat$salary, dat$employer, quantile, probs=seq(0, 1, .1))
# $A
#          0%         10%         20%         30%         40%         50%         60%         70% 
# -2.41444189 -1.40732877 -1.12317885 -0.64970145 -0.47523453 -0.09430894  0.15215525  0.35878949 
#         80%         90%        100% 
#  0.65762946  1.08900468  2.60805224 
# 
# $B
#          0%         10%         20%         30%         40%         50%         60%         70% 
# -2.94139814 -1.27564687 -0.95004621 -0.57881100 -0.31022591 -0.14494699 -0.02373928  0.50534378 
#         80%         90%        100% 
#  0.92179302  1.41398773  1.98714112 

To get it all into one data frame for outputting, you can use the same arguments, but with the aggregate function:

aggregate(dat$salary, list(dat$employer), quantile, probs=seq(0, 1, .1))
#   Group.1        x.0%       x.10%       x.20%       x.30%       x.40%       x.50%       x.60%       x.70%
# 1       A -2.41444189 -1.40732877 -1.12317885 -0.64970145 -0.47523453 -0.09430894  0.15215525  0.35878949
# 2       B -2.94139814 -1.27564687 -0.95004621 -0.57881100 -0.31022591 -0.14494699 -0.02373928  0.50534378
#         x.80%       x.90%      x.100%
# 1  0.65762946  1.08900468  2.60805224
# 2  0.92179302  1.41398773  1.98714112

Upvotes: 2

Related Questions