atmosphere506
atmosphere506

Reputation: 343

Histogram of uniform distribution not plotted correctly in R

When I run the code

hist(1:5)

or

hist(c(1,2,3,4,5))

The generated histogram shows that the first number "1" has frequency of 2 when there is only one "1" in the array.

enter image description here

I also tried

hist(c(1,2,3,7,7,7,9))

but it still shows that the first bar is twice times higher than the second one

enter image description here

However when I run

 hist(c(1:10))

The frequency height of every bars are equal

I'm pretty new to statistics and R so I don't know what is the reason behind this. I hope somebody can help me clarify why is this happening. Thank you

enter image description here

Upvotes: 5

Views: 2040

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 270075

Try this:

> trace("hist.default", quote(print(fuzzybreaks)), at = 25)
Tracing function "hist.default" in package "graphics"
[1] "hist.default"
>
> out <- hist(1:5)
Tracing hist.default(1:5) step 25 
[1] 0.9999999 2.0000001 3.0000001 4.0000001 5.0000001
> out$count
[1] 2 1 1 1

which shows the actual fuzzybreaks value it is using as well as the count in each bin. Clearly there are two points in the first bin (between 0.9999999 and 2.0000001) and one point in every other bin.

Compare with:

> out <- hist(1:5, breaks = 0:5 + 0.5)
Tracing hist.default(1:5, breaks = 0:5 + 0.5) step 25 
[1] 0.4999999 1.5000001 2.5000001 3.5000001 4.5000001 5.5000001
> out$count
[1] 1 1 1 1 1

Now there is clearly one point in each bin.

Upvotes: 8

Scott Ritchie
Scott Ritchie

Reputation: 10543

Taking your first example, hist(1:5), you have five numbers, which get put into four bins. So two of those five get lumped into one.

The histogram has breaks at 2, 3, 4, and 5, so you can reasonably infer that the definition of hist for where a number is plotted, is:

#pseudocode
if (i <= break) { # plot in bin }

You can specify the breaks manually to solve this:

hist(1:5, breaks=0:5)

enter image description here

Upvotes: 12

FvD
FvD

Reputation: 3794

What you are seeing is that hist is placing 1:5 into four bins. So there will be one bin with 2 counts.

If you specify the cutoff points like so:

 hist(1:5, breaks=(c(0.5, 1.5, 2.5, 3.5, 4.5 , 5.5)))

then you will get the behaviour that you expect.

Upvotes: 5

Related Questions