Reputation: 150
I want to sum a value by week. Sometimes the first or last week will have less than 7 days. In the example below the data starts with 2016-01-01, but the floor date for that week is 2015-12-27. So the weekly sum is based on two days instead of seven. I understand that this behaviour is completely logical, but i would like, that the first and last week (that might consist of less than 7 days of data) don´t show as low values in the plot. How can i do this? Should i omit the first and last week? Should i use an average value here? How?
expenses <- data_frame(
date=seq(as.Date("2016-01-01"), as.Date("2016-12-31"), by=1),
amount=rgamma(length(date), shape = 2, scale = 20))
plot_df <- expenses %>%
mutate(Week = floor_date(date, "week")) %>%
group_by(Week) %>%
summarize(exp_sum = sum(amount))
ggplot(data = plot_df,
aes(x = as.Date(Week), y = exp_sum)) +
geom_line() +
geom_point() +
scale_x_date(date_breaks = "1 week", date_labels = "%W")
Upvotes: 1
Views: 66
Reputation: 1362
As the periods do not include the same number of days my first recommendation would be to delete them, for this you should only select your database minus the first and last line. This is really simple and it is done in a line.
plot_df <- plot_df[-c(1,nrow(plot_df)),]
The second way would be to add the average value of all the values. However, this should be reflected in the results.
plot_df[c(1,nrow(plot_df)),"exp_sum"] <- mean(plot_df$exp_sum)
My last try is to assign the value that is after or before it:
plot_df[1,"exp_sum"] <- plot_df[2, "exp_sum"]
plot_df[nrow(plot_df), "exp_sum"] <- plot_df[nrow(plot_df)-1, "exp_sum"]
As I told you, I would erase them.
Upvotes: 1