Reputation: 2699
I have a data set with events happening at a certain time of the day. I'd like to make a histogram of events per hour. I looked up R - emulate the default behavior of hist() with ggplot2 for bin width and R hist vs geom_hist break points which brougt me this far, but this is still not what I want. I tried this:
library(ggplot2)
set.seed(1)
df1 = data.frame(t=as.integer(rnorm(100, 25, 8)) / 2) %% 24
ggplot(data=df1, aes(x=t)) +
geom_histogram(binwidth=1, colour="steelblue4", fill="steelblue") +
stat_bin(binwidth = 1, aes(label=..count..), vjust=-0.5, geom = "text") +
scale_x_continuous("Time",
breaks=seq(0, 23, by=4),
labels=c("00:00", "04:00", "08:00", "12:00", "16:00", "20:00")) +
scale_y_continuous(breaks = 0:15)
and got this image:
This histogram is 1) incorrect and 2) not showing what I want. The histogram gives the impression that there are two events at (or around) 04:00. When we look at the data, wee see that there is an event at 3.5 (i.e. 03:30) and at 4.5 (04:30). What I'd actually want is the histogram showing the number of events in the range [00:00, 01:00), [01:00, 02:00) ... [23:00, 24:00). The event at 03:30 should be assigned to a different bin than the event at 04:30. Also, I'd like the histogram to span the whole day, from 00:00 to 24:00. Something like this (photoshopped!):
which is congruent with
Time <- cut(df1$t, breaks = 0:24, dig.lab = 4, right = FALSE)
as.data.frame(table(Time))
Time Freq
1 [0,1) 0
2 [1,2) 0
3 [2,3) 0
4 [3,4) 1
5 [4,5) 1
6 [5,6) 1
7 [6,7) 3
8 [7,8) 4
9 [8,9) 2
10 [9,10) 7
11 [10,11) 11
12 [11,12) 8
13 [12,13) 12
14 [13,14) 10
15 [14,15) 14
16 [15,16) 8
17 [16,17) 6
18 [17,18) 4
19 [18,19) 5
20 [19,20) 0
21 [20,21) 1
22 [21,22) 1
23 [22,23) 1
24 [23,24) 0
Is this possible at all using geom_histogram() and, if not, what else should I use?
Upvotes: 2
Views: 2321
Reputation: 1
Try using and messing with just the geom_histogram() with the following arguments:
geom_histogram(binwidth=1, center = 0.5, colour="steelblue4", fill="steelblue")
What this should do is start your first bin with a center of 0.5, and the bindwidth being 1 should extend down to 0 and up to 1, and for each subsequent bin, it should similarly be a one-unit bin. I'm not sure how it will work if you want the next bin to include a value that is also in the first bin. This worked for me when I wanted weekly bins on time data (e.g., 1-7 is one bin because it equals 1 week, 8-14 is another bin because it equals 2 weeks, and so on).
Upvotes: 0
Reputation: 2699
A solution may be to use geom_col() instead of geom_histogram():
Time <- cut(df1$t, breaks = 0:24, dig.lab = 4, right = FALSE)
ggplot(data=as.data.frame(table(Time)), aes(x=.5+0:23, y=Freq)) +
geom_col(colour="steelblue4", fill="steelblue") +
geom_text(aes(label=Freq), vjust=-0.5) +
scale_x_continuous("Time",
breaks=seq(0, 24, by=4),
labels=c("00:00", "04:00", "08:00", "12:00", "16:00", "20:00", "24:00")) +
scale_y_continuous("count", breaks = 0:15)
which leads to the following figure:
but I admit that it is somewhat inelegant, since it requires generating a separate data frame for the graph.
Upvotes: 2