leoOrion
leoOrion

Reputation: 1957

Generating test data in R

I am trying to generate this table as one of the inputs to a test.

        id                 diff          d
 1:      1                    2 2020-07-31
 2:      1                    1 2020-08-01
 3:      1                    1 2020-08-02
 4:      1                    1 2020-08-03
 5:      1                    1 2020-08-04
 6:      2                    2 2020-07-31
 7:      2                    1 2020-08-01
 8:      2                    1 2020-08-02
 9:      2                    1 2020-08-03
10:      2                    1 2020-08-04
11:      3                    2 2020-07-31
12:      3                    1 2020-08-01
13:      3                    1 2020-08-02
14:      3                    1 2020-08-03
15:      3                    1 2020-08-04
16:      4                    2 2020-07-31
17:      4                    1 2020-08-01
18:      4                    1 2020-08-02
19:      4                    1 2020-08-03
20:      4                    1 2020-08-04
21:      5                    2 2020-07-31
22:      5                    1 2020-08-01
23:      5                    1 2020-08-02
24:      5                    1 2020-08-03
25:      5                    1 2020-08-04
        id                 diff          d

I have done it like this -

input1 = data.table(id=as.character(1:5), diff=1)
input1 = input1[,.(d=seq(as.Date('2020-07-31'), by='days', length.out = 5)),.(id, diff)]
input1[d == '2020-07-31']$diff = 2

diff is basically the number of days to the next weekday. Eg. 31st Jul 2020 is Friday. Hence diff is 2 which is the diff to the next weekday, Monday. For the others it will be 1.

I personally dont like that I had to generate the date sequence for each of the ids separately or the hardcoding of the diff that I have to do in the input for 31st July. Is there a more generic way of doing this without the hardcoding?

Upvotes: 0

Views: 92

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389065

We can create all combination of dates and id using crossing and create diff column based on whether the weekday is "Friday".

library(dplyr)

tidyr::crossing(id = 1:5, d = seq(as.Date('2020-07-31'), 
                          by='days', length.out = 5)) %>%
    mutate(diff = as.integer(weekdays(d) == 'Friday') + 1)

Similar logic using base R expand.grid :

transform(expand.grid(id = 1:5, 
                      d = seq(as.Date('2020-07-31'), by='days', length.out = 5)), 
          diff = as.integer(weekdays(d) == 'Friday') + 1)

and CJ in data.table :

library(data.table)
df <- CJ(id = 1:5, d = seq(as.Date('2020-07-31'), by='days', length.out = 5))
df[, diff := as.integer(weekdays(d) == 'Friday') + 1]

Upvotes: 3

Related Questions