Niels
Niels

Reputation: 150

How to aggregate weekly with uncomplete first and last week?

I want to sum a value by week. Sometimes the first or last week will have less than 7 days. In the example below the data starts with 2016-01-01, but the floor date for that week is 2015-12-27. So the weekly sum is based on two days instead of seven. I understand that this behaviour is completely logical, but i would like, that the first and last week (that might consist of less than 7 days of data) don´t show as low values in the plot. How can i do this? Should i omit the first and last week? Should i use an average value here? How?

expenses <- data_frame(
  date=seq(as.Date("2016-01-01"), as.Date("2016-12-31"), by=1),
  amount=rgamma(length(date), shape = 2, scale = 20))

plot_df <-  expenses %>% 
  mutate(Week = floor_date(date, "week")) %>%  
  group_by(Week) %>% 
  summarize(exp_sum = sum(amount))

ggplot(data = plot_df, 
       aes(x = as.Date(Week), y = exp_sum)) + 
  geom_line() +
  geom_point() + 
  scale_x_date(date_breaks = "1 week", date_labels = "%W")

Plot Example

Upvotes: 1

Views: 66

Answers (1)

Tito Sanz
Tito Sanz

Reputation: 1362

As the periods do not include the same number of days my first recommendation would be to delete them, for this you should only select your database minus the first and last line. This is really simple and it is done in a line.

plot_df <- plot_df[-c(1,nrow(plot_df)),]

The second way would be to add the average value of all the values. However, this should be reflected in the results.

plot_df[c(1,nrow(plot_df)),"exp_sum"] <- mean(plot_df$exp_sum)

My last try is to assign the value that is after or before it:

plot_df[1,"exp_sum"] <- plot_df[2, "exp_sum"]
plot_df[nrow(plot_df), "exp_sum"] <- plot_df[nrow(plot_df)-1, "exp_sum"]

As I told you, I would erase them.

Upvotes: 1

Related Questions