Orhan Yazar
Orhan Yazar

Reputation: 909

Create new rows based on date variable

Problem

I'm working on data with several variables including a "date" variable. The problem is that i need to have all dates for all couple of variable. Here is an example of my dataframe:

df <- data.frame(c("a","a","c","c","c"),c(15,16,0,18,95),c(0,16,0,18,95),c(2017.01,2017.02,2017.01,2017.02,2017.03))
colnames(df) <- c("PL","CDE","LIV","date")

I would like to create the row with date 2017.03 as date for the PL a and set the values CDE and LIV to 0.

I gave you here a simplified example but i have multiple values for PL and multiple missing date. I mean, one PL will have 2017.01 and 201.05 missing but another one will hahe 2017.01, 2017.02, 2017.03.

What i tried

I tried to create an array "sem" of date from 2013.01 to 2013.53 and 2014.01 to 2014.53, ... 2017.01 to 2017.47 and then i created a for loop over the PL variable andin the array "sem" i keep only the date that i don't have for the variable "PL" and i just do a cbind of the array "sem" and the variable "PL" so it duplicates it itself and then i create the variables "CDE" and "LIV" that i set to 0 and finnaly i do an rbind

But this solution takes too long i have over 38 000 variable "PL"

Upvotes: 0

Views: 155

Answers (1)

Uwe
Uwe

Reputation: 42544

If I understand the question correctly, the OP wants to create a data frame which contains all possible combinations of the weeks from 2013.01 to 2017.47 and the distinct values of PL.

This requires to use expand.grid() from base R or CJ() (cross join) from the data.table package and to join the result with the exisiting df.

library(data.table)
# cross join (only 2 years and 5 weeks for demonstration)
mDT <- CJ(yr = 2016:2017, wk = 1:5, PL = unique(df$PL))[
  , date := yr + wk / 100][, !c("yr", "wk")]
# join with df
setDT(df)[mDT, on = .(PL, date)][
  # replace NA by 0 as requested
  is.na(CDE), CDE := 0][is.na(LIV), LIV := 0][]
    PL CDE LIV    date
 1:  a   0   0 2016.01
 2:  c   0   0 2016.01
 3:  a   0   0 2016.02
 4:  c   0   0 2016.02
 5:  a   0   0 2016.03
 6:  c   0   0 2016.03
 7:  a   0   0 2016.04
 8:  c   0   0 2016.04
 9:  a   0   0 2016.05
10:  c   0   0 2016.05
11:  a  15   0 2017.01
12:  c   0   0 2017.01
13:  a  16  16 2017.02
14:  c  18  18 2017.02
15:  a   0   0 2017.03
16:  c  95  95 2017.03
17:  a   0   0 2017.04
18:  c   0   0 2017.04
19:  a   0   0 2017.05
20:  c   0   0 2017.05

The helper table mDT looks like:

    PL    date
 1:  a 2016.01
 2:  c 2016.01
 3:  a 2016.02
 4:  c 2016.02
 5:  a 2016.03
 6:  c 2016.03
 7:  a 2016.04
 8:  c 2016.04
 9:  a 2016.05
10:  c 2016.05
11:  a 2017.01
12:  c 2017.01
13:  a 2017.02
14:  c 2017.02
15:  a 2017.03
16:  c 2017.03
17:  a 2017.04
18:  c 2017.04
19:  a 2017.05
20:  c 2017.05

The full range of values in mDT can be created by:

mDT <- CJ(yr = 2013:2017, wk = 1:53, PL = unique(df$PL))[
  , date := yr + wk / 100][date <= 2017.47, !c("yr", "wk")]
mDT
     PL    date
  1:  a 2013.01
  2:  c 2013.01
  3:  a 2013.02
  4:  c 2013.02
  5:  a 2013.03
 ---           
514:  c 2017.45
515:  a 2017.46
516:  c 2017.46
517:  a 2017.47
518:  c 2017.47

Upvotes: 1

Related Questions