Reputation: 1
I have such a csv file with the first column being a value and the second column being the number of times this value appears. Basically, it's a probability distribution. Now I want to use R to calculate a confidence interval. Say what's the interval for 95% confidence level, how about the 90%, 85% and etc.
I searched for hours, couldn't find a proper way to do that. Sorry for my stupidity.
Thanks, J
Upvotes: 0
Views: 865
Reputation: 263481
Sounds like you want a weighted quantile function. The Hmisc package provides one:
install.packages("Hmisc")
# the first example from the help page for ?wtd.quantile
set.seed(1)
x <- runif(500)
wts <- sample(1:6, 500, TRUE)
std.dev <- sqrt(wtd.var(x, wts))
wtd.quantile(x, wts)
#-----------
0% 25% 50% 75% 100%
0.001836858 0.262917845 0.482080115 0.747400865 0.996077372
death <- sample(0:1, 500, TRUE)
plot(wtd.loess.noiter(x, death, wts, type='evaluate'))
describe(~x, weights=wts)
#-----------
x
2 Variables 500 Observations
---------------------------------------------------------------------------
x
n missing unique Info Mean .05 .10 .25 .50
1766 0 500 1 0.502 0.07068 0.11890 0.26292 0.48208
.75 .90 .95
0.74740 0.91162 0.95515
lowest : 0.001837 0.001933 0.011150 0.013078 0.013390
highest: 0.991839 0.991906 0.992684 0.993749 0.996077
----------------------------------------------------------------------------
(weights)
n missing unique Info Mean
1766 0 6 0.95 4.364
1 2 3 4 5 6
Frequency 87 138 282 296 465 498
% 5 8 16 17 26 28
----------------------------------------------------------------------------
# describe uses wtd.mean, wtd.quantile, wtd.table
Upvotes: 5