daremoinai
daremoinai

Reputation: 25

Grid specification in smooth c.d.f. estimation ("kerdiest" package)

I wanted to get a smooth estimate of a cumulative distribution function. One of ways to do this is to integrate a kernel density estimator, getting a kernel distribution estimator. In order to get one, I used the kde function from the "kerdiest" package.

The problem is that I have to specify a grid which affects the results greatly. The default choice of grid leads to a graph that differs from the plot of empirical distribution function significantly (see the picture; white dots represent the empirical c.d.f.). I can pick up grid values so that the kernel estimator and ecdf would coincide but I do not understand how it works.

So, what is the grid and how should it be chosen? Is there any other way to get a kernel estimator of a distribution function?

The data I have been experimenting with is waiting times of the Old Faithful Geyser dataset in R. The code is

x <- faithful$waiting
library("kerdiest")
n = length(x)
kcdf <- kde(type_kernel = "n", x, bw = 1/sqrt(n))
plot(kcdf$Estimated_values)
lines(ecdf(x))

default grid

Upvotes: 0

Views: 256

Answers (1)

IRTFM
IRTFM

Reputation: 263362

Instead of plotting with the default plot function you should be using both the Estimated_values and the grid values to form the initial plot. The the lines function will have the correct x-values . (The clue here is the labeling of your plot. When seeing the "Index" label, you might have wondered whether it was the correct scale. When plot gets a single vector of numeric values it uses their ordering sequence as the "Index" value, so you see integers: 1:length(vector))

with( kcdf, plot(Estimated_values ~ grid) )  # using plot.formula
lines(ecdf(x))

enter image description here

Upvotes: 1

Related Questions