Reputation: 45
I'm using SAS to plot an histogram with the Kernel density. In the documentation, it is specified that we can choose the parameter c: "the standardized bandwidth for a number that is greater than 0 and less than or equal to 100." But I cannot find the default value used to create the following plot.
Does someone have an idea? Thanks!
Upvotes: 0
Views: 684
Reputation: 12909
SGPLOT minimizes the Asymptotic Mean Integrated Square Error (AMISE) for the kernel density function. According to PROC UNIVARIATE
, which also can do KDE:
By default, the procedure uses the AMISE method to compute kernel density estimates.
We can confirm that they both have the same default by comparing the output.
proc univariate data=sashelp.cars;
var horsepower;
histogram / kernel;
run;
In the log, we find:
NOTE: The normal kernel estimate for c=0.7852 has a bandwidth of 21.035 and an AMISE of 392E-7.
Let's plot them together and compare the values.
proc sgplot data=sashelp.cars;
density horsepower/TYPE=KERNEL;
density horsepower/TYPE=KERNEL(c=0.7852);
ods output sgplot;
run;
data diff;
set sgplot;
abs_diff = abs(KERNEL_Horsepower____Y - KERNEL_Horsepower_C_0_7852____Y);
run;
proc univariate data=diff;
var abs_diff;
run;
The average difference between all points plotted is 1.65x10^-9, with the overall largest being 6.76x10^-9. This is, essentially, zero. The reason for the differences is that the c-value given to the user in the log is lower precision than the one calculated by proc sgplot
. You can get a higher precision estimate with the outkernel=
option in proc univariate
as well.
Upvotes: 1