user709413
user709413

Reputation: 515

Density plot based on time of the day

I've the following dataset:

https://app.box.com/s/au58xaw60r1hyeek5cua6q20byumgvmj

I want to create a density plot based on the time of the day. Here is what I've done so far:

library("ggplot2")
library("scales")
library("lubridate")

timestamp_df$timestamp_time <- format(ymd_hms(hn_tweets$timestamp), "%H:%M:%S")

ggplot(timestamp_df, aes(timestamp_time)) + 
       geom_density(aes(fill = ..count..)) +
       scale_x_datetime(breaks = date_breaks("2 hours"),labels=date_format("%H:%M"))

It gives the following error: Error: Invalid input: time_trans works with objects of class POSIXct only

If I convert that to POSIXct, it adds dates to the data.

Update 1

The following converted data to 'NA'

timestamp_df$timestamp_time <- as.POSIXct(timestamp_df$timestamp_time, format = "%H:%M%:%S", tz = "UTC"

Update 2

Following is what I want to achieve: enter image description here

Upvotes: 1

Views: 4014

Answers (2)

reisner
reisner

Reputation: 258

One problem with the solutions posted here is that they ignore the fact that this data is circular/polar (i.e. 00hrs == 24hrs). You can see on the plots on the other answer that the ends of the charts dont match up with each other. This wont make too much of a difference with this particular dataset, but for events that happen near midnight, this could be an extremely biased estimator of density. Here's my solution, taking into account the circular nature of time data:

# modified code from https://freakonometrics.hypotheses.org/2239

library(dplyr)
library(ggplot2)
library(lubridate)
library(circular)

df = read.csv("data.csv")
datetimes = df$timestamp %>%
  lubridate::parse_date_time("%m/%d/%Y %h:%M")
times_in_decimal = lubridate::hour(datetimes) + lubridate::minute(datetimes) / 60
times_in_radians = 2 * pi * (times_in_decimal / 24)

# Doing this just for bandwidth estimation:
basic_dens = density(times_in_radians, from = 0, to = 2 * pi)

res = circular::density.circular(circular::circular(times_in_radians,
                                                    type = "angle",
                                                    units = "radians",
                                                    rotation = "clock"),
                                 kernel = "wrappednormal",
                                 bw = basic_dens$bw)

time_pdf = data.frame(time = as.numeric(24 * (2 * pi + res$x) / (2 * pi)), # Convert from radians back to 24h clock
                      likelihood = res$y)

p = ggplot(time_pdf) +
  geom_area(aes(x = time, y = likelihood), fill = "#619CFF") +
  scale_x_continuous("Hour of Day", labels = 0:24, breaks = 0:24) +
  scale_y_continuous("Likelihood of Data") +
  theme_classic()

Density Plot considering circular data

Note that the values and slopes of the density plot match up at the 00h and 24h points.

Upvotes: 5

missuse
missuse

Reputation: 19756

Here is one approach:

library(ggplot2)
library(lubridate)
library(scales)

df <- read.csv("data.csv") #given in OP

convert character to POSIXct

df$timestamp <- as.POSIXct(strptime(df$timestamp, "%m/%d/%Y %H:%M",  tz = "UTC"))

library(hms)

extract hour and minute:

df$time <- hms::hms(second(df$timestamp), minute(df$timestamp), hour(df$timestamp))  

convert to POSIXct again since ggplot does not work with class hms.

df$time <- as.POSIXct(df$time)


ggplot(df, aes(time)) + 
  geom_density(fill = "red", alpha = 0.5) + #also play with adjust such as adjust = 0.5
  scale_x_datetime(breaks = date_breaks("2 hours"), labels=date_format("%H:%M"))

enter image description here

to plot it scaled to 1:

ggplot(df) + 
  geom_density( aes(x = time, y = ..scaled..), fill = "red", alpha = 0.5) +
  scale_x_datetime(breaks = date_breaks("2 hours"), labels=date_format("%H:%M"))

where ..scaled.. is a computed variable for stat_density made during plot creation.

enter image description here

Upvotes: 1

Related Questions