Igor F.
Igor F.

Reputation: 2699

Specifying exact bin range in geom_histogram

I have a data set with events happening at a certain time of the day. I'd like to make a histogram of events per hour. I looked up R - emulate the default behavior of hist() with ggplot2 for bin width and R hist vs geom_hist break points which brougt me this far, but this is still not what I want. I tried this:

library(ggplot2)
set.seed(1)
df1 = data.frame(t=as.integer(rnorm(100, 25, 8)) / 2) %% 24
ggplot(data=df1, aes(x=t)) +
  geom_histogram(binwidth=1, colour="steelblue4", fill="steelblue") +
  stat_bin(binwidth = 1, aes(label=..count..), vjust=-0.5, geom = "text") +
  scale_x_continuous("Time",
    breaks=seq(0, 23, by=4),
    labels=c("00:00", "04:00", "08:00", "12:00", "16:00", "20:00")) +
  scale_y_continuous(breaks = 0:15)

and got this image:

Incorrect histogram

This histogram is 1) incorrect and 2) not showing what I want. The histogram gives the impression that there are two events at (or around) 04:00. When we look at the data, wee see that there is an event at 3.5 (i.e. 03:30) and at 4.5 (04:30). What I'd actually want is the histogram showing the number of events in the range [00:00, 01:00), [01:00, 02:00) ... [23:00, 24:00). The event at 03:30 should be assigned to a different bin than the event at 04:30. Also, I'd like the histogram to span the whole day, from 00:00 to 24:00. Something like this (photoshopped!):

Better histogram, manually adjusted

which is congruent with

Time <- cut(df1$t, breaks = 0:24, dig.lab = 4, right = FALSE) 
as.data.frame(table(Time))

      Time Freq
1    [0,1)    0
2    [1,2)    0
3    [2,3)    0
4    [3,4)    1
5    [4,5)    1
6    [5,6)    1
7    [6,7)    3
8    [7,8)    4
9    [8,9)    2
10  [9,10)    7
11 [10,11)   11
12 [11,12)    8
13 [12,13)   12
14 [13,14)   10
15 [14,15)   14
16 [15,16)    8
17 [16,17)    6
18 [17,18)    4
19 [18,19)    5
20 [19,20)    0
21 [20,21)    1
22 [21,22)    1
23 [22,23)    1
24 [23,24)    0

Is this possible at all using geom_histogram() and, if not, what else should I use?

Upvotes: 2

Views: 2321

Answers (2)

Myan RcManus
Myan RcManus

Reputation: 1

Try using and messing with just the geom_histogram() with the following arguments:

geom_histogram(binwidth=1, center = 0.5, colour="steelblue4", fill="steelblue")

What this should do is start your first bin with a center of 0.5, and the bindwidth being 1 should extend down to 0 and up to 1, and for each subsequent bin, it should similarly be a one-unit bin. I'm not sure how it will work if you want the next bin to include a value that is also in the first bin. This worked for me when I wanted weekly bins on time data (e.g., 1-7 is one bin because it equals 1 week, 8-14 is another bin because it equals 2 weeks, and so on).

Upvotes: 0

Igor F.
Igor F.

Reputation: 2699

A solution may be to use geom_col() instead of geom_histogram():

Time <- cut(df1$t, breaks = 0:24, dig.lab = 4, right = FALSE) 
ggplot(data=as.data.frame(table(Time)), aes(x=.5+0:23, y=Freq)) +
  geom_col(colour="steelblue4", fill="steelblue") +
  geom_text(aes(label=Freq), vjust=-0.5) +
  scale_x_continuous("Time",
    breaks=seq(0, 24, by=4),
    labels=c("00:00", "04:00", "08:00", "12:00", "16:00", "20:00", "24:00")) +
  scale_y_continuous("count", breaks = 0:15)

which leads to the following figure:

Histogram - possible solution

but I admit that it is somewhat inelegant, since it requires generating a separate data frame for the graph.

Upvotes: 2

Related Questions