Sakura
Sakura

Reputation: 57

Group the dataframe by week and add a column that calculates the variance/mean of that week in R

I am a beginner in R and in coding in general.. I have a dataframe that looks like this:

 Date         Week          Spend 
1   2019-07-14 2019-07-08   1.81
2   2019-07-13 2019-07-08   1.31
3   2019-07-12 2019-07-08   1.56
4   2019-07-11 2019-07-08   0.45
5   2019-07-10 2019-07-08   5.00

The whole data has several weeks. First, I will need to group the data by week and sum the values.

For now I tried this:

df$nweek = (rep(1:15, each= 7))

Results:

   Date       Week     Spend     nweek
1   2019-07-14 2019-07-08  1.81      1
2   2019-07-13 2019-07-08  1.31      1
3   2019-07-12 2019-07-08  1.56      1
4   2019-07-11 2019-07-08  0.45      1
5   2019-07-10 2019-07-08  5.00      1
6   2019-07-09 2019-07-08  3.59      1
7   2019-07-08 2019-07-08  4.08      1
8   2019-07-07 2019-07-01  2.83      2
9   2019-07-06 2019-07-01  1.38      2
10  2019-07-05 2019-07-01  1.59      2
11  2019-07-04 2019-07-01  0.93      2
12  2019-07-03 2019-07-01  1.50      2
13  2019-07-02 2019-07-01  3.22      2
14  2019-07-01 2019-07-01  6.20      2
15  2019-06-30 2019-06-24  5.47      3
16  2019-06-29 2019-06-24  1.77      3

so that in this way I can have an "id" of each week. However, for some reason I cannot group my dataframe by this sequence of number I just produced:

df = df %>% group_by(nweek) %>%
  summarise (Spend = sum(Spend))

Instead, the result only gives me one row and sums the value(Spend) of the whole dataframe. I tried as.character on the "nweek" column and didnt work

Second,

After grouping the dataframe by week, I'm trying to calculate the mean and standard deviation each week, and return those values to new columns in the data frame. How can I do this?

Thanks

Upvotes: 0

Views: 283

Answers (2)

thus__
thus__

Reputation: 486

I would make one slight change to Ryan John's great solution. You can use mutate() to modify the Date, Week, and week_num columns all in one pipe.

df <-  tibble::tribble(
  ~Date,       ~Week, ~Spend, ~nweek,
  "7/14/2019",  "7/8/2019",   1.81,      1,
  "7/13/2019",  "7/8/2019",   1.31,      1,
  "7/12/2019",  "7/8/2019",   1.56,      1,
  "7/11/2019",  "7/8/2019",   0.45,      1,
  "7/10/2019",  "7/8/2019",   5.95,      1,
  "7/9/2019",  "7/8/2019",   3.59,      1,
  "7/8/2019",  "7/8/2019",   4.08,      1,
  "7/7/2019",  "7/1/2019",   2.83,      2,
  "7/6/2019",  "7/1/2019",   1.38,      2,
  "7/5/2019",  "7/1/2019",   1.59,      2,
  "7/4/2019",  "7/1/2019",   0.93,      2,
  "7/3/2019",  "7/1/2019",    1.5,      2,
  "7/2/2019",  "7/1/2019",   3.22,      2,
  "7/1/2019",  "7/1/2019",    6.2,      2,
  "6/30/2019", "6/24/2019",   5.47,      3,
  "6/29/2019", "6/24/2019",   1.77,      3
)

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:lubridate':
#> 
#>     intersect, setdiff, union
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df %>% 
  mutate(Date = mdy(Date),
         Week = mdy(Week),
         week_num = week(Date)) %>% 
  group_by(week_num) %>% 
  summarise(spend_sum = sum(Spend),
            spend_sd = sd(Spend))
#> # A tibble: 3 x 3
#>   week_num spend_sum spend_sd
#>      <dbl>     <dbl>    <dbl>
#> 1       26      13.4     2.38
#> 2       27      15.5     1.16
#> 3       28      14.7     2.00

Created on 2019-07-17 by the reprex package (v0.2.1)

Upvotes: 1

Ryan John
Ryan John

Reputation: 1430

Try this:

library(tibble)

df <-  tibble::tribble(
           ~Date,       ~Week, ~Spend, ~nweek,
     "7/14/2019",  "7/8/2019",   1.81,      1,
     "7/13/2019",  "7/8/2019",   1.31,      1,
     "7/12/2019",  "7/8/2019",   1.56,      1,
     "7/11/2019",  "7/8/2019",   0.45,      1,
     "7/10/2019",  "7/8/2019",   5.95,      1,
      "7/9/2019",  "7/8/2019",   3.59,      1,
      "7/8/2019",  "7/8/2019",   4.08,      1,
      "7/7/2019",  "7/1/2019",   2.83,      2,
      "7/6/2019",  "7/1/2019",   1.38,      2,
      "7/5/2019",  "7/1/2019",   1.59,      2,
      "7/4/2019",  "7/1/2019",   0.93,      2,
      "7/3/2019",  "7/1/2019",    1.5,      2,
      "7/2/2019",  "7/1/2019",   3.22,      2,
      "7/1/2019",  "7/1/2019",    6.2,      2,
     "6/30/2019", "6/24/2019",   5.47,      3,
     "6/29/2019", "6/24/2019",   1.77,      3
     )

library(lubridate)
df$Date <-  lubridate::mdy(df$Date)
df$Week <-  lubridate::mdy(df$Week)
df$week_num <- lubridate::week(df$Date)

library(dplyr)
df %>%  
  group_by(week_num) %>% 
  summarise(spend_sum = sum(Spend),
            spend_sd = sd(Spend))

Upvotes: 0

Related Questions