Reputation: 612
I would like to generate a dataframe dynamically, such that it will automatically update the row values as there is more data being collected, so that I can plot a bar chart in ggplot.
As of now, I only have data till the previous month and the current data looks like this:
Date Count
2018-09-01 12
2018-09-02 23
2018-09-03 5
2018-09-04 8
. .
. .
. .
2018-09-30 10
Moving forward, more data will be collected and there will be a value for the "Count" column for every single day.
I am able to convert the above df into a monthly.df using the following:
library(dplyr)
df %>% group_by(month=floor_date(Date, "month")) %>% summarize(Count=sum(Count))
month Users
2018-09-01 165
If I plot a ggplot chart using this new df, it will give me only a single bar as there is currently no data for the other months. However, I would still like to plot a monthly chart with 0 for months where there are no values. My goal is to generate a dataframe that looks like this:
Year Month Count
2018 Jan 0
2018 Feb 0
2018 Mar 0
2018 Apr 0
2018 May 0
2018 Jun 0
2018 Jul 0
2018 Aug 0
2018 Sep 55
2018 Oct 0
2018 Nov 0
2018 Dec 0
So that I can plot a chart that looks like this:
library(ggplot)
ggplot(monthly.users, aes(x= Month, y= Count, fill= Month)) + geom_bar(stat= "identity")
And the values for the charts (i.e. each month) will be automatically generated as the data is being collected.
Not sure if I need to write some function to calculate the values for each month and then do a rbind into a final dataframe. Greatly appreciate if anyone can help me on this!
Upvotes: 0
Views: 55
Reputation: 66490
Padr::pad
is a useful function for this sort of thing.
monthly.users <- df %>%
group_by(month = lubridate::floor_date(Date, "1 month")) %>%
summarize(Count=sum(Count)) %>%
padr::pad(start_val = lubridate::ymd(20180101),
interval = "1 month") %>%
mutate(Count = tidyr::replace_na(Count, 0))
Upvotes: 1