Reputation: 57
I am a beginner in R and in coding in general.. I have a dataframe that looks like this:
Date Week Spend
1 2019-07-14 2019-07-08 1.81
2 2019-07-13 2019-07-08 1.31
3 2019-07-12 2019-07-08 1.56
4 2019-07-11 2019-07-08 0.45
5 2019-07-10 2019-07-08 5.00
The whole data has several weeks. First, I will need to group the data by week and sum the values.
For now I tried this:
df$nweek = (rep(1:15, each= 7))
Results:
Date Week Spend nweek
1 2019-07-14 2019-07-08 1.81 1
2 2019-07-13 2019-07-08 1.31 1
3 2019-07-12 2019-07-08 1.56 1
4 2019-07-11 2019-07-08 0.45 1
5 2019-07-10 2019-07-08 5.00 1
6 2019-07-09 2019-07-08 3.59 1
7 2019-07-08 2019-07-08 4.08 1
8 2019-07-07 2019-07-01 2.83 2
9 2019-07-06 2019-07-01 1.38 2
10 2019-07-05 2019-07-01 1.59 2
11 2019-07-04 2019-07-01 0.93 2
12 2019-07-03 2019-07-01 1.50 2
13 2019-07-02 2019-07-01 3.22 2
14 2019-07-01 2019-07-01 6.20 2
15 2019-06-30 2019-06-24 5.47 3
16 2019-06-29 2019-06-24 1.77 3
so that in this way I can have an "id" of each week. However, for some reason I cannot group my dataframe by this sequence of number I just produced:
df = df %>% group_by(nweek) %>%
summarise (Spend = sum(Spend))
Instead, the result only gives me one row and sums the value(Spend) of the whole dataframe. I tried as.character on the "nweek" column and didnt work
Second,
After grouping the dataframe by week, I'm trying to calculate the mean and standard deviation each week, and return those values to new columns in the data frame. How can I do this?
Thanks
Upvotes: 0
Views: 283
Reputation: 486
I would make one slight change to Ryan John's great solution. You can use mutate()
to modify the Date, Week, and week_num columns all in one pipe.
df <- tibble::tribble(
~Date, ~Week, ~Spend, ~nweek,
"7/14/2019", "7/8/2019", 1.81, 1,
"7/13/2019", "7/8/2019", 1.31, 1,
"7/12/2019", "7/8/2019", 1.56, 1,
"7/11/2019", "7/8/2019", 0.45, 1,
"7/10/2019", "7/8/2019", 5.95, 1,
"7/9/2019", "7/8/2019", 3.59, 1,
"7/8/2019", "7/8/2019", 4.08, 1,
"7/7/2019", "7/1/2019", 2.83, 2,
"7/6/2019", "7/1/2019", 1.38, 2,
"7/5/2019", "7/1/2019", 1.59, 2,
"7/4/2019", "7/1/2019", 0.93, 2,
"7/3/2019", "7/1/2019", 1.5, 2,
"7/2/2019", "7/1/2019", 3.22, 2,
"7/1/2019", "7/1/2019", 6.2, 2,
"6/30/2019", "6/24/2019", 5.47, 3,
"6/29/2019", "6/24/2019", 1.77, 3
)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:lubridate':
#>
#> intersect, setdiff, union
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df %>%
mutate(Date = mdy(Date),
Week = mdy(Week),
week_num = week(Date)) %>%
group_by(week_num) %>%
summarise(spend_sum = sum(Spend),
spend_sd = sd(Spend))
#> # A tibble: 3 x 3
#> week_num spend_sum spend_sd
#> <dbl> <dbl> <dbl>
#> 1 26 13.4 2.38
#> 2 27 15.5 1.16
#> 3 28 14.7 2.00
Created on 2019-07-17 by the reprex package (v0.2.1)
Upvotes: 1
Reputation: 1430
Try this:
library(tibble)
df <- tibble::tribble(
~Date, ~Week, ~Spend, ~nweek,
"7/14/2019", "7/8/2019", 1.81, 1,
"7/13/2019", "7/8/2019", 1.31, 1,
"7/12/2019", "7/8/2019", 1.56, 1,
"7/11/2019", "7/8/2019", 0.45, 1,
"7/10/2019", "7/8/2019", 5.95, 1,
"7/9/2019", "7/8/2019", 3.59, 1,
"7/8/2019", "7/8/2019", 4.08, 1,
"7/7/2019", "7/1/2019", 2.83, 2,
"7/6/2019", "7/1/2019", 1.38, 2,
"7/5/2019", "7/1/2019", 1.59, 2,
"7/4/2019", "7/1/2019", 0.93, 2,
"7/3/2019", "7/1/2019", 1.5, 2,
"7/2/2019", "7/1/2019", 3.22, 2,
"7/1/2019", "7/1/2019", 6.2, 2,
"6/30/2019", "6/24/2019", 5.47, 3,
"6/29/2019", "6/24/2019", 1.77, 3
)
library(lubridate)
df$Date <- lubridate::mdy(df$Date)
df$Week <- lubridate::mdy(df$Week)
df$week_num <- lubridate::week(df$Date)
library(dplyr)
df %>%
group_by(week_num) %>%
summarise(spend_sum = sum(Spend),
spend_sd = sd(Spend))
Upvotes: 0