Reputation: 379
I have a dataset where different cities go in and out of a program, like this example dataset:
example.dat <- data.frame (c(1000, 2000, 3000), c("15-10-01", "16-05-01", "16-07-01"), c("16-06-01", "16-10-01", "17-08-01"))
colnames(example.dat) <- c("Population", "Enter.Program", "Leave.Program")
This gives you a data frame that looks like:
Population Enter.Program Leave.Program
1000 15-10-01 16-06-01
2000 16-05-01 16-10-01
3000 16-07-01 17-08-01
First, I'd like to create an output table like this:
Per.Begin Per.End Total.Pop.In
15-10-01 16-04-30 1000
16-05-01 16-05-30 3000
16-06-01 16-06-30 2000
16-07-01 16-09-30 5000
16-10-01 17-07-30 3000
17-08-01 18-04-26 0
And then plot this in ggplot as a graph that looks like either a step function or kind of like a jagged rectangular surface where the top edge is the running total, kind of like a cumulative density function but where the y-axis can go down as well as up, and where the x-axis goes in steps that are the width of time periods.
Here are the steps I've blocked out, but I don't know how to execute:
Upvotes: 1
Views: 1371
Reputation: 4537
Using dplyr
(because you tagged the question with it) you can do what you want. The main things that need to happen are:
Code is below
library(dplyr)
library(ggplot2)
example.dat <- data.frame (c(1000, 2000, 3000), c("15-10-01", "16-05-01", "16-07-01"), c("16-06-01", "16-10-01", "17-08-01"))
colnames(example.dat) <- c("Population", "Enter.Program", "Leave.Program")
changes = example.dat %>%
select("Population","Date"="Enter.Program") %>%
bind_rows(example.dat %>%
select("Population","Date"="Leave.Program") %>%
mutate(Population = -1*Population)) %>%
mutate(Date = as.Date(Date,"%y-%m-%d"))
startDate = min(changes$Date)
endDate = max(changes$Date)
final = data_frame(Date = seq(startDate,endDate,1)) %>%
left_join(changes,by="Date") %>%
mutate(Population = cumsum(ifelse(is.na(Population),0,Population)))
ggplot(data = final,aes(x=Date,y=Population)) +
geom_line()
UPDATE
If you don't want to have every date from the earliest to the latest, you can use a blurgh for
loop to add the needed rows to get a pretty result. Here we walk through and duplicate each date after the first with the preceding cumulative sum. It's not pretty, but it makes the graph.
library(dplyr)
library(ggplot2)
example.dat <- data.frame (c(1000, 2000, 3000), c("15-10-01", "16-05-01", "16-07-01"), c("16-06-01", "16-10-01", "17-08-01"))
colnames(example.dat) <- c("Population", "Enter.Program", "Leave.Program")
changes = example.dat %>%
select("Population","Date"="Enter.Program") %>%
bind_rows(example.dat %>%
select("Population","Date"="Leave.Program") %>%
mutate(Population = -1*Population)) %>%
mutate(Date = as.Date(Date,"%y-%m-%d")) %>%
arrange(Date) %>%
mutate(Population = cumsum(Population))
for(i in nrow(changes):2){
changes = bind_rows(changes[1:(i-1),],
data_frame(Population = changes$Population[i-1],Date = changes$Date[i]),
changes[i:nrow(changes),])
}
ggplot(data = changes,aes(x=Date,y=Population)) +
geom_line()
Upvotes: 1