Reputation: 3
So my data looks like this, with # of rows = 6000 approx.
pidp avgy06
1 68160489 20182.36849
2 68575973 13845.49024
3 69180553 35.61806
4 69786365 13117.26465
5 69815605 15791.40283
6 69833973 10327.94531
I would like to know if there's a way to apply the quantile()
function, so as to add another column that gives me what percentile of the data is each value of avgy06
. For example, this is the 100 percentiles of avgy06
:
0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10%
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 189.0078
11% 12% 13% 14% 15% 16% 17% 18% 19% 20% 21%
790.2671 1505.4875 2364.4903 2900.0230 3441.0689 3680.2787 4246.6805 4595.0131 4704.8372 4904.6381 5217.9201
22% 23% 24% 25% 26% 27% 28% 29% 30% 31% 32%
5421.2263 5621.4581 6166.7022 6673.1660 6851.0085 7261.1324 7588.7569 7947.6250 8292.3789 8606.2774 8938.2232
33% 34% 35% 36% 37% 38% 39% 40% 41% 42% 43%
9286.9695 9665.7901 9885.2171 10035.7984 10280.0676 10423.1376 10633.2589 10886.2913 11205.7540 11411.0259 11581.6681
44% 45% 46% 47% 48% 49% 50% 51% 52% 53% 54%
11763.5549 11926.4006 12210.2935 12434.3433 12581.4526 12781.9956 13135.6904 13305.6350 13666.1352 13814.4657 14046.4000
55% 56% 57% 58% 59% 60% 61% 62% 63% 64% 65%
14258.2219 14431.6258 14631.6608 14940.7309 15168.2559 15385.1055 15583.7370 15757.0793 15906.4169 16094.3642 16448.5898
66% 67% 68% 69% 70% 71% 72% 73% 74% 75% 76%
16683.5195 16817.0613 17049.2498 17361.5975 17663.5911 18004.6763 18309.8879 18614.3184 18871.4102 19220.2478 19529.0051
77% 78% 79% 80% 81% 82% 83% 84% 85% 86% 87%
19962.8668 20249.0984 20526.2794 20690.6686 20896.2913 21135.7998 21396.8414 21763.6818 22070.5915 22494.2696 23000.0000
88% 89% 90% 91% 92% 93% 94% 95% 96% 97% 98%
23486.0340 24206.6486 25106.3743 26261.0410 26593.7715 27402.7684 28079.6456 28910.4655 30315.5573 32447.8075 39225.6094
99% 100%
41759.9540 57456.0758
I would like an extra column in my dataframe that reads the value of avgy06
and returns to what percentile of the data does such value correspond to (e.g 75th, 63th...). If there's another way without using the quantile()
function, let me know.
Many thanks!
Upvotes: 0
Views: 113
Reputation: 8602
I'll use another dataset for illustration. What you're looking for is the empirical cumulative distribution function or ecdf
.
data(iris) #data for illustration
data_ecdf <- ecdf(iris[, 'Sepal.Length'])
iris[, 'Sepal.Length.Percentile'] <- data_ecdf(iris[, 'Sepal.Length'])
head(iris)
# Output:
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length.Percentile
# 1 5.1 3.5 1.4 0.2 setosa 0.27333333
# 2 4.9 3.0 1.4 0.2 setosa 0.14666667
# 3 4.7 3.2 1.3 0.2 setosa 0.07333333
# 4 4.6 3.1 1.5 0.2 setosa 0.06000000
# 5 5.0 3.6 1.4 0.2 setosa 0.21333333
# 6 5.4 3.9 1.7 0.4 setosa 0.34666667
Upvotes: 2