Reputation: 1739
I know this must be pretty basic, but what is the proper, accurate way to plot the PDF of some sample data that you know comes from some pop. distribution, like if you generated it using rnorm()
or rexp()
?
The reason I ask is because I know a lot of people use density()
, and then input that into plot()
, but the density()
function seems too arbitrary to be accurate; for example, it is inaccurate when it approximates negative value for data that came from the exponential distribution, which does not possess neg. values.
So could someone recommend me a more fine-tuned method to accomplish plotting sample PDFs?
Upvotes: 3
Views: 2192
Reputation: 1869
ggplot
does help take care of negative values when they are not appropriate. It can be used in the following manner:
ggplot(df,
aes(x=contVar, fill = "green")) +
geom_density(alpha=.3)
I would also take a look at this post in cross validated
Upvotes: 0
Reputation: 1447
The density
function performs kernel density estimation (KDE). To find the best KDE for your dataset, you should tune the bandwidth (parameter bw
). Here's a paper that discusses KDE and bandwidth selection: http://www.stat.washington.edu/courses/stat527/s13/readings/Sheather_StatSci_2004.pdf
Or for a simpler approach, you can try out different bandwidth methods to pass to bw
:
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/bandwidth.html
The current default, "nrd0", is there for historical reasons. I find "ucv" and "bcv" have worked better for my datasets.
Upvotes: 1