Reputation: 601
I have a data set that is something like this:
start_date end_date outcome
1 2014-07-18 2014-08-20 TRUE
2 2014-08-04 2014-09-23 TRUE
3 2014-08-01 2014-09-03 TRUE
4 2014-08-01 2014-09-03 TRUE
5 2014-12-10 2014-12-10 TRUE
6 2014-10-11 2014-11-07 TRUE
7 2015-04-27 2015-05-20 TRUE
8 2014-11-22 2014-12-25 TRUE
9 2015-03-24 2015-04-26 TRUE
10 2015-03-12 2015-04-10 FALSE
11 2014-05-29 2014-06-28 FALSE
12 2015-03-19 2015-04-20 TRUE
13 2015-03-25 2015-04-26 TRUE
14 2015-03-25 2015-04-26 TRUE
15 2014-07-09 2014-08-10 TRUE
16 2015-03-26 2015-04-26 TRUE
17 2014-07-09 2014-08-10 TRUE
18 2015-03-30 2015-04-28 TRUE
19 2014-03-13 2014-04-13 TRUE
20 2015-04-01 2015-04-29 TRUE
I want to plot a histogram where each bar corresponds to a month and it contains the proportion of FALSE / ALL = (FALSE + TRUE) in that month.
What is the easiest way to do this in R preferably using ggplot?
Upvotes: 0
Views: 606
Reputation: 23574
Here is one way. There will be better ways to do this. But I will leave what I tried. The main job was to create a new data frame for the graphic. Using your data above, I first converted factors to date objects. If yo have date objects in your data, you do not need this. Then, I summarised your data for start_date
and end_date
using count()
. I bound the two data frames and further did the calculation to get the proportion of FALSE for each month.
library(zoo)
library(dplyr)
library(ggplot2)
library(lubridate)
mutate_each(mydf, funs(as.POSIXct(., format = "%Y-%m-%d")), -outcome) %>%
mutate_each(funs(paste(year(.),"-",month(.), sep = "")), vars = -outcome) -> foo1;
count(foo1, start_date, outcome) %>% rename(date = start_date) -> foo2;
count(foo1, end_date, outcome) %>%
rename(date = end_date) %>%
bind_rows(foo2) %>%
group_by(date, outcome) %>%
summarize(total = sum(n)) %>%
summarize(prop = length(which(outcome == FALSE)) / sum(total)) %>%
mutate(date = as.Date(as.yearmon(date))) -> foo3
ggplot(data = foo3, aes(x = date, y = prop)) +
geom_bar(stat = "identity") +
scale_x_date(labels = date_format("%Y-%m"), breaks = date_breaks("month")) +
theme(axis.text.x = element_text(angle = 90, vjust = 1))
Upvotes: 1