Kuo-Hsien Chang
Kuo-Hsien Chang

Reputation: 935

ggplot: confusing about user-defined bin number in density plot

My basic question is how to set up bin number (default is 30) for geom_density.

I found that the density in y-axis did not change even the bin has been modified.

Here is an example:

values <- runif(1000, 1, 100)
ind <- as.factor(rep(c(1:2), each=500))
inout <- as.factor(rep(c(1:2), each =500))
df <- data.frame(values,ind,inout)

ggplot(df,aes(x=values, ..density..)) + 
    geom_freqpoly(aes(group=interaction(ind,inout), colour=factor(inout)), alpha=1, bins=1) 

The density should be 1, because bin number was defined to 1. However, the result did not show what I expected.

Do you know what I miss here? Any tips to define bin number or bin threshold for ggplot geom_density?

Thanks a lot.

Upvotes: 2

Views: 1166

Answers (1)

tsurudak
tsurudak

Reputation: 602

In ggplot you don't set the number of bins per se, you instead set the width of the bins using binwidth (default is range/30). bin isn't a term that geom_freqpoly understands so it is ignored in your example code.

I think an example using the range 0-1 (instead of 1-100) will better illustrate what you were expecting to see:

values <- runif(1000, 0, 1) # generate values between 0 and 1
ind <- as.factor(rep(c(1:2), each=500))
inout <- as.factor(rep(c(1:2), each =500))
df <- data.frame(values,ind,inout)

ggplot(df, aes(x=values, ..density..)) + 
    geom_freqpoly(aes(group=interaction(ind,inout), 
                      colour=factor(inout)), alpha=1) #use default binwidth, i.e. 1/30

This gives a graph similar to what your code generated

geom_freqpoly with default binwidth

With a range of 1, setting binwidth = 1 means there will be one bin which will give a density of 1 at a value of 0.5. Notice that now the range of values is 0.5 to 1.5 as the area under the density curve must always sum to 1.

ggplot(df, aes(x=values, ..density..)) + 
    geom_freqpoly(aes(group=interaction(ind,inout), 
                      colour=factor(inout)), alpha=1, binwidth = 1) #binwidth = 1

geom_freqpoly with binwidth set to 1

If you increase the number of points you randomly generate and decrease the binwidth (e.g. try 0.1, 0.01, 0.001, etc) you'll get closer to the "square-looking" probability density function you'd expect for a uniform distribution (e.g. as shown on wikipedia)

Upvotes: 0

Related Questions