Reputation: 909
Problem
I'm working on data with several variables including a "date" variable. The problem is that i need to have all dates for all couple of variable. Here is an example of my dataframe:
df <- data.frame(c("a","a","c","c","c"),c(15,16,0,18,95),c(0,16,0,18,95),c(2017.01,2017.02,2017.01,2017.02,2017.03))
colnames(df) <- c("PL","CDE","LIV","date")
I would like to create the row with date 2017.03 as date for the PL a and set the values CDE and LIV to 0.
I gave you here a simplified example but i have multiple values for PL and multiple missing date. I mean, one PL will have 2017.01 and 201.05 missing but another one will hahe 2017.01, 2017.02, 2017.03.
What i tried
I tried to create an array
"sem" of date from 2013.01 to 2013.53 and 2014.01 to 2014.53, ... 2017.01 to 2017.47 and then i created a for loop
over the PL variable andin the array "sem" i keep only the date that i don't have for the variable "PL" and i just do a cbind
of the array "sem" and the variable "PL" so it duplicates
it itself and then i create the variables "CDE" and "LIV" that i set to 0 and finnaly i do an rbind
But this solution takes too long i have over 38 000 variable "PL"
Upvotes: 0
Views: 155
Reputation: 42544
If I understand the question correctly, the OP wants to create a data frame which contains all possible combinations of the weeks from 2013.01 to 2017.47 and the distinct values of PL
.
This requires to use expand.grid()
from base R or CJ()
(cross join) from the data.table
package and to join the result with the exisiting df
.
library(data.table)
# cross join (only 2 years and 5 weeks for demonstration)
mDT <- CJ(yr = 2016:2017, wk = 1:5, PL = unique(df$PL))[
, date := yr + wk / 100][, !c("yr", "wk")]
# join with df
setDT(df)[mDT, on = .(PL, date)][
# replace NA by 0 as requested
is.na(CDE), CDE := 0][is.na(LIV), LIV := 0][]
PL CDE LIV date 1: a 0 0 2016.01 2: c 0 0 2016.01 3: a 0 0 2016.02 4: c 0 0 2016.02 5: a 0 0 2016.03 6: c 0 0 2016.03 7: a 0 0 2016.04 8: c 0 0 2016.04 9: a 0 0 2016.05 10: c 0 0 2016.05 11: a 15 0 2017.01 12: c 0 0 2017.01 13: a 16 16 2017.02 14: c 18 18 2017.02 15: a 0 0 2017.03 16: c 95 95 2017.03 17: a 0 0 2017.04 18: c 0 0 2017.04 19: a 0 0 2017.05 20: c 0 0 2017.05
The helper table mDT
looks like:
PL date 1: a 2016.01 2: c 2016.01 3: a 2016.02 4: c 2016.02 5: a 2016.03 6: c 2016.03 7: a 2016.04 8: c 2016.04 9: a 2016.05 10: c 2016.05 11: a 2017.01 12: c 2017.01 13: a 2017.02 14: c 2017.02 15: a 2017.03 16: c 2017.03 17: a 2017.04 18: c 2017.04 19: a 2017.05 20: c 2017.05
The full range of values in mDT
can be created by:
mDT <- CJ(yr = 2013:2017, wk = 1:53, PL = unique(df$PL))[
, date := yr + wk / 100][date <= 2017.47, !c("yr", "wk")]
mDT
PL date 1: a 2013.01 2: c 2013.01 3: a 2013.02 4: c 2013.02 5: a 2013.03 --- 514: c 2017.45 515: a 2017.46 516: c 2017.46 517: a 2017.47 518: c 2017.47
Upvotes: 1