Marco
Marco

Reputation: 2797

How to represent the timeline of events (date-times) in R

I have a data.frame with a date-time-column, like

 D = data.frame(time = c("2007-06-22","2007-05-22","2007-05-23"))
 D$time <- strptime(D$time, format = "%m/%d/%Y")
 class(D$time)
"POSIXlt" "POSIXt" 

I would like to create a plot with an additional neutral timeline on the x-axis, say, for year 2007, ticks/units should be month. So just a "histogram" of dates.

I tried hist(D$time, breaks = "days") but it just returns errors.

Otherwise dates could be plotted for the given time-interval (say year 2007) for each single date, on a metric scale. So something like "geom_jitter".

I tried ggplot(D$time) but it can't handle the POSIXlt/POSIXt class.

I am looking for an easy straightforward way to plot my time events on within a given interval. Thank you so much.

Upvotes: 0

Views: 985

Answers (1)

clemens
clemens

Reputation: 6813

You can use ggplot2 and scales to achieve this:

library(gglot2)
library(scales)

First create a ggplot with data = D and time as you x aesthetic. Add a geom_bar() (i.e. the bars) and change the x axis to show only the month and set specific limits (in this case first and last day of 2007):

ggplot(data = D, aes(x = time)) + geom_bar() + 
  scale_x_date(labels = date_format("%b"),
               limits = c(as.Date('2007-01-01'), as.Date('2007-12-31')))

Which returns:

ggplot_output

If you want to show the events per month, you could use lubridate and dplyr, and ggplot2:

library(dplyr)
library(lubridate)
library(ggplot2)

D = data.frame(time = c("2007-06-22","2007-05-22","2007-05-23"))

In this case you get the abbreviated month of the date:

D2 <- D
D2$month <- month(D$time, label = TRUE)

You can group by month and count the number of events:

D2 <- D2 %>% 
  group_by(month) %>%
  summarise(n = n())

Add the missing months (if any) to your dataframe with n = 0:

D2 <- rbind(D2, 
            data.frame(month = levels(D2$month)[!(levels(D2$month) %in% D2$month)],
                       n = 0))

Plot the new data (Note: use stat = 'identity' in geom_bar() since you explicitly pass the count in the y aesthetic:):

ggplot(data = D2, aes(x = month, y = n)) + 
  geom_bar(stat = 'identity')

Which returns:

output2

Option number 3:

A more flexible approach using many years:

D = data.frame(time = c("2006-05-16", "2007-06-22","2007-05-22","2007-05-23")) 

(Note: One date in different year added)

Create an additional year column:

D3 <- D
D3$month <- month(D$time, label = TRUE)
D3$year <- year(D$time)

Group by month and year:

D3 <- D3 %>% 
  group_by(year, month) %>%
  summarise(n = n())

Find the missing months per year:

missing <- do.call("rbind", 
                   lapply(unique(D3$year), function(y) {
                     data.frame(year = y,
                                month = levels(D3[D3$year == y, ]$month)[!(levels(D3[D3$year == y, ]$month) %in% D3[D3$year == y, ]$month)],
                                n = 0)


                   }))      

Combine D3 and missing:

all <- rbind(as.data.frame(D3), missing)

Create new visualisation:

ggplot(data = all, aes(x = month, y = n, group = factor(year), fill = factor(year))) + 
  geom_bar(position = "dodge", stat = 'identity')

Which looks like this:

ggplot_3

Upvotes: 1

Related Questions