Reputation: 1421
I have histogram data of the form
Key | #occurences_of_key
--------------------------
-10 | 1200
0 | 1000
10 | 700
33 | 500
67 | 200
89 | 134
--------------------------
Code to make it:
structure(c(-10, 0, 10, 33, 67, 89, 1200, 1000, 700, 500, 200, 134), .Dim = c(6L, 2L))
I want to plot an Empirical Cumulative Distribution Chart (percentile chart)
using R
with this data. I am new to R
, so I appreciate any pointers. I read about the ecdf
function available in R
but it is hard for me to follow.
Upvotes: 2
Views: 855
Reputation: 2724
If your data is huge (and that's why you pre-tabulated it before loading to R), you don't want to generate some 'dummy' data again. You can hack the implementation of ecdf
to accept tabulated data:
tab_ecdf <- function (xs, counts)
{
n <- sum(counts)
if (n < 1)
stop("'x' must have 1 or more non-missing values")
rval <- approxfun(xs, cumsum(counts) / n,
method = "constant", yleft = 0, yright = 1, f = 0, ties = "ordered")
class(rval) <- c("ecdf", "stepfun", class(rval))
assign("nobs", n, envir = environment(rval))
attr(rval, "call") <- sys.call()
rval
}
And then use it instead of the original ecdf()
function.
Upvotes: 2
Reputation: 15395
One way I can think of would be to use rep
to reconstruct the original data and use ecdf
on that.
mat <- structure(c(-10, 0, 10, 33, 67, 89, 1200, 1000, 700, 500, 200, 134), .Dim = c(6L, 2L))
original <- unlist(apply(mat, 1, function(x) rep(x[1], x[2])))
original_ecdf <- ecdf(original)
plot(original_ecdf)
Upvotes: 3