Coolio2654
Coolio2654

Reputation: 1739

The proper way to plot PDF of a sample of data

I know this must be pretty basic, but what is the proper, accurate way to plot the PDF of some sample data that you know comes from some pop. distribution, like if you generated it using rnorm() or rexp()?

The reason I ask is because I know a lot of people use density(), and then input that into plot(), but the density() function seems too arbitrary to be accurate; for example, it is inaccurate when it approximates negative value for data that came from the exponential distribution, which does not possess neg. values.

So could someone recommend me a more fine-tuned method to accomplish plotting sample PDFs?

Upvotes: 3

Views: 2192

Answers (2)

DataTx
DataTx

Reputation: 1869

ggplot does help take care of negative values when they are not appropriate. It can be used in the following manner:

ggplot(df, 
       aes(x=contVar, fill = "green")) + 
  geom_density(alpha=.3)

I would also take a look at this post in cross validated

Upvotes: 0

Kelli-Jean
Kelli-Jean

Reputation: 1447

The density function performs kernel density estimation (KDE). To find the best KDE for your dataset, you should tune the bandwidth (parameter bw). Here's a paper that discusses KDE and bandwidth selection: http://www.stat.washington.edu/courses/stat527/s13/readings/Sheather_StatSci_2004.pdf

Or for a simpler approach, you can try out different bandwidth methods to pass to bw: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/bandwidth.html

The current default, "nrd0", is there for historical reasons. I find "ucv" and "bcv" have worked better for my datasets.

Upvotes: 1

Related Questions