Group the dataframe by week and add a column that calculates the variance/mean of that week in R

Question

I am a beginner in R and in coding in general.. I have a dataframe that looks like this:

 Date         Week          Spend 
1   2019-07-14 2019-07-08   1.81
2   2019-07-13 2019-07-08   1.31
3   2019-07-12 2019-07-08   1.56
4   2019-07-11 2019-07-08   0.45
5   2019-07-10 2019-07-08   5.00

The whole data has several weeks. First, I will need to group the data by week and sum the values.

For now I tried this:

df$nweek = (rep(1:15, each= 7))

Results:

   Date       Week     Spend     nweek
1   2019-07-14 2019-07-08  1.81      1
2   2019-07-13 2019-07-08  1.31      1
3   2019-07-12 2019-07-08  1.56      1
4   2019-07-11 2019-07-08  0.45      1
5   2019-07-10 2019-07-08  5.00      1
6   2019-07-09 2019-07-08  3.59      1
7   2019-07-08 2019-07-08  4.08      1
8   2019-07-07 2019-07-01  2.83      2
9   2019-07-06 2019-07-01  1.38      2
10  2019-07-05 2019-07-01  1.59      2
11  2019-07-04 2019-07-01  0.93      2
12  2019-07-03 2019-07-01  1.50      2
13  2019-07-02 2019-07-01  3.22      2
14  2019-07-01 2019-07-01  6.20      2
15  2019-06-30 2019-06-24  5.47      3
16  2019-06-29 2019-06-24  1.77      3

so that in this way I can have an "id" of each week. However, for some reason I cannot group my dataframe by this sequence of number I just produced:

df = df %>% group_by(nweek) %>%
  summarise (Spend = sum(Spend))

Instead, the result only gives me one row and sums the value(Spend) of the whole dataframe. I tried as.character on the "nweek" column and didnt work

Second,

After grouping the dataframe by week, I'm trying to calculate the mean and standard deviation each week, and return those values to new columns in the data frame. How can I do this?

Thanks

Ryan John · Accepted Answer

Try this:

library(tibble)

df <-  tibble::tribble(
           ~Date,       ~Week, ~Spend, ~nweek,
     "7/14/2019",  "7/8/2019",   1.81,      1,
     "7/13/2019",  "7/8/2019",   1.31,      1,
     "7/12/2019",  "7/8/2019",   1.56,      1,
     "7/11/2019",  "7/8/2019",   0.45,      1,
     "7/10/2019",  "7/8/2019",   5.95,      1,
      "7/9/2019",  "7/8/2019",   3.59,      1,
      "7/8/2019",  "7/8/2019",   4.08,      1,
      "7/7/2019",  "7/1/2019",   2.83,      2,
      "7/6/2019",  "7/1/2019",   1.38,      2,
      "7/5/2019",  "7/1/2019",   1.59,      2,
      "7/4/2019",  "7/1/2019",   0.93,      2,
      "7/3/2019",  "7/1/2019",    1.5,      2,
      "7/2/2019",  "7/1/2019",   3.22,      2,
      "7/1/2019",  "7/1/2019",    6.2,      2,
     "6/30/2019", "6/24/2019",   5.47,      3,
     "6/29/2019", "6/24/2019",   1.77,      3
     )

library(lubridate)
df$Date <-  lubridate::mdy(df$Date)
df$Week <-  lubridate::mdy(df$Week)
df$week_num <- lubridate::week(df$Date)

library(dplyr)
df %>%  
  group_by(week_num) %>% 
  summarise(spend_sum = sum(Spend),
            spend_sd = sd(Spend))

Group the dataframe by week and add a column that calculates the variance/mean of that week in R

Answers (2)

Related Questions