ZPeh
ZPeh

Reputation: 612

How can I dynamically generate a dataframe as I collect more values in R?

I would like to generate a dataframe dynamically, such that it will automatically update the row values as there is more data being collected, so that I can plot a bar chart in ggplot.

As of now, I only have data till the previous month and the current data looks like this:

Date            Count
2018-09-01         12
2018-09-02         23
2018-09-03          5
2018-09-04          8
.                   .
.                   .
.                   .
2018-09-30         10

Moving forward, more data will be collected and there will be a value for the "Count" column for every single day.

I am able to convert the above df into a monthly.df using the following:

library(dplyr)
df %>% group_by(month=floor_date(Date, "month")) %>% summarize(Count=sum(Count))

month      Users
2018-09-01   165

If I plot a ggplot chart using this new df, it will give me only a single bar as there is currently no data for the other months. However, I would still like to plot a monthly chart with 0 for months where there are no values. My goal is to generate a dataframe that looks like this:

Year Month Count
2018   Jan     0
2018   Feb     0
2018   Mar     0
2018   Apr     0
2018   May     0
2018   Jun     0
2018   Jul     0
2018   Aug     0
2018   Sep    55
2018   Oct     0
2018   Nov     0
2018   Dec     0

So that I can plot a chart that looks like this:

library(ggplot)
ggplot(monthly.users, aes(x= Month, y= Count, fill= Month)) + geom_bar(stat= "identity")

Month on Month Chart

And the values for the charts (i.e. each month) will be automatically generated as the data is being collected.

Not sure if I need to write some function to calculate the values for each month and then do a rbind into a final dataframe. Greatly appreciate if anyone can help me on this!

Upvotes: 0

Views: 55

Answers (1)

Jon Spring
Jon Spring

Reputation: 66490

Padr::pad is a useful function for this sort of thing.

monthly.users <- df %>% 
  group_by(month = lubridate::floor_date(Date, "1 month")) %>%  
  summarize(Count=sum(Count)) %>%
  padr::pad(start_val = lubridate::ymd(20180101), 
            interval = "1 month") %>%
  mutate(Count = tidyr::replace_na(Count, 0))

Upvotes: 1

Related Questions