Generating different iterations of a dataset

Question

I have a dataset made of items (rows) categorized only with integers from 0 to 4 (which represent degrees of my discrete variable). To which I have two years of data, 1980 and then from 1996 (columns).

df <- read.table(text = "
1980  1996
1    1
2    4
4    1", header = T)

My goal is to generate data for the intermediate years 1984, 1988 and 1992.

df.new <- data.frame(X1980 = NULL, X1984 = NULL, X1988 = NULL, X1992 = NULL, X1996 = NULL)

However for this virtual data to be reality based, it must follow 3 laws:

items assigned the same integer in 1980 and 1996, remain the same throughout the whole period
items which increase or decrease from 1980 to 1996, can only change by one integer in a given time step (items can not skip integers).
items can only increase or decrease (items have to be monotonic)

to achieve this I am using:

for(i in 1:nrow(df)){

lst <- ifelse(df$X1980[i] > df$X1996[i], 
              list(sort(sample(df$X1980[i]:df$X1996[i],3,replace = T), decreasing = T)),
              list(sort(sample(df$X1980[i]:df$X1996[i],3,replace = T), decreasing = F)))

  lst <- c(df$X1980[i], unlist(lst), df$X1996[i])

  df.new <- rbind(df.new, data.frame(X1980 = lst[1], 
                                 X1984 = lst[2], 
                                 X1988 = lst[3], 
                                 X1992 = lst[4], 
                                 X1996 = lst[5]))
}

Which seems to work well, since df.new produces:

  X1980 X1984 X1988 X1992 X1996
1     1     1     1     1     1
2     2     3     4     4     4
3     4     4     3     2     1

There are of course multiple variations of this dataset that also follow my 3 laws.

How should I write a loop that allows me to generate sim = 1000 law abiding iterations of this dataset?

and how can I be sure that no item (in any database) breaks any of my 3 laws?

Currently trying results <- foreach (i = 1:sim, .combine="df") %dopar% before the loop but have been unsuccessful so far.

Any help or advise will be greatly appreciated.

F. Priv&#233; · Accepted Answer

You can do:

library(foreach)
results <- foreach(i = 1:100) %dopar% {
  foreach(i = 1:nrow(df), .combine = "rbind") %do% {

    lst <- ifelse(df$X1980[i] > df$X1996[i], 
                  list(sort(sample(df$X1980[i]:df$X1996[i],3,replace = T), decreasing = T)),
                  list(sort(sample(df$X1980[i]:df$X1996[i],3,replace = T), decreasing = F)))

    lst <- c(df$X1980[i], unlist(lst), df$X1996[i])

    data.frame(X1980 = lst[1], 
               X1984 = lst[2], 
               X1988 = lst[3], 
               X1992 = lst[4], 
               X1996 = lst[5])
  }
}
do.call("rbind", results)

foreach works like lapply, it puts in a list what you return (the last element) from your expressions.

Generating different iterations of a dataset

Answers (1)

Related Questions