Reputation: 787
I have data like the following :
library(lubridate)
library(dplyr)
library(data.table)
MWE <- data.table(
Date=rep(seq(ymd("2020-1-1"), ymd("2020-3-30"), by = "days"),each=6),
Country=rep(c("France","United States","Germany"),90*6),
TransportType=rep(c("Train","Cars"),each=3,90*3),
Value=rnorm(90*6,2,3)
)
I want to create a new variable, that is the mean of value :
So the mean should be calculated on January and February, but in the database for the whole period.
I have managed to do the first two (or I think so, I am checking) :
MWE_2 <- MWE %>%
.[,JourSem:=weekdays(Date)] %>%
.[,Moyenne:=mean(Value),by=.(Country,JourSem,TransportType)]
But I am unsure how to pass another condition in that. I think I get it form this
MWE_3 <- MWE %>%
.[,JourSem:=weekdays(Date)] %>%
.[Date <= "2020-02-29",Moyenne:=mean(Value),by=.(Country,JourSem,TransportType)]
But I lack the value for March dates, which is logical, as they are filtered out, which is therefore not what I want.
Upvotes: 0
Views: 89
Reputation: 388797
We can first calculate mean for January and February month for each weekday and then join this data with March data.
library(data.table)
MWE[, JourSem:=weekdays(Date)]
d1 <- MWE[Date <= as.Date("2020-02-29")] %>%
.[, .(Moyenne = mean(Value)), JourSem]
MWE[Date > as.Date("2020-02-29")][d1, on = 'JourSem']
Upvotes: 1