user2840286
user2840286

Reputation: 601

Plotting histogram for data with start and end date

I have a data set that is something like this:

          start_date       end_date        outcome
1         2014-07-18       2014-08-20         TRUE
2         2014-08-04       2014-09-23         TRUE
3         2014-08-01       2014-09-03         TRUE
4         2014-08-01       2014-09-03         TRUE
5         2014-12-10       2014-12-10         TRUE
6         2014-10-11       2014-11-07         TRUE
7         2015-04-27       2015-05-20         TRUE
8         2014-11-22       2014-12-25         TRUE
9         2015-03-24       2015-04-26         TRUE
10        2015-03-12       2015-04-10        FALSE
11        2014-05-29       2014-06-28        FALSE
12        2015-03-19       2015-04-20         TRUE
13        2015-03-25       2015-04-26         TRUE
14        2015-03-25       2015-04-26         TRUE
15        2014-07-09       2014-08-10         TRUE
16        2015-03-26       2015-04-26         TRUE
17        2014-07-09       2014-08-10         TRUE
18        2015-03-30       2015-04-28         TRUE
19        2014-03-13       2014-04-13         TRUE
20        2015-04-01       2015-04-29         TRUE

I want to plot a histogram where each bar corresponds to a month and it contains the proportion of FALSE / ALL = (FALSE + TRUE) in that month.

What is the easiest way to do this in R preferably using ggplot?

Upvotes: 0

Views: 606

Answers (1)

jazzurro
jazzurro

Reputation: 23574

Here is one way. There will be better ways to do this. But I will leave what I tried. The main job was to create a new data frame for the graphic. Using your data above, I first converted factors to date objects. If yo have date objects in your data, you do not need this. Then, I summarised your data for start_date and end_date using count(). I bound the two data frames and further did the calculation to get the proportion of FALSE for each month.

library(zoo)
library(dplyr)
library(ggplot2)
library(lubridate)

mutate_each(mydf, funs(as.POSIXct(., format = "%Y-%m-%d")), -outcome) %>%
mutate_each(funs(paste(year(.),"-",month(.), sep = "")), vars = -outcome) -> foo1;
count(foo1, start_date, outcome) %>% rename(date = start_date) -> foo2;
count(foo1, end_date, outcome) %>%
rename(date = end_date) %>%
bind_rows(foo2) %>%
group_by(date, outcome) %>%
summarize(total = sum(n)) %>%
summarize(prop = length(which(outcome == FALSE)) / sum(total)) %>%
mutate(date = as.Date(as.yearmon(date))) -> foo3

ggplot(data = foo3, aes(x = date, y = prop)) +
geom_bar(stat = "identity") +
scale_x_date(labels = date_format("%Y-%m"), breaks = date_breaks("month")) +
theme(axis.text.x = element_text(angle = 90, vjust = 1))

enter image description here

Upvotes: 1

Related Questions