Sergio Henriques
Sergio Henriques

Reputation: 135

Generating different iterations of a dataset

I have a dataset made of items (rows) categorized only with integers from 0 to 4 (which represent degrees of my discrete variable). To which I have two years of data, 1980 and then from 1996 (columns).

df <- read.table(text = "
1980  1996
1    1
2    4
4    1", header = T)

My goal is to generate data for the intermediate years 1984, 1988 and 1992.

df.new <- data.frame(X1980 = NULL, X1984 = NULL, X1988 = NULL, X1992 = NULL, X1996 = NULL)

However for this virtual data to be reality based, it must follow 3 laws:

to achieve this I am using:

for(i in 1:nrow(df)){

lst <- ifelse(df$X1980[i] > df$X1996[i], 
              list(sort(sample(df$X1980[i]:df$X1996[i],3,replace = T), decreasing = T)),
              list(sort(sample(df$X1980[i]:df$X1996[i],3,replace = T), decreasing = F)))

  lst <- c(df$X1980[i], unlist(lst), df$X1996[i])

  df.new <- rbind(df.new, data.frame(X1980 = lst[1], 
                                 X1984 = lst[2], 
                                 X1988 = lst[3], 
                                 X1992 = lst[4], 
                                 X1996 = lst[5]))
}

Which seems to work well, since df.new produces:

  X1980 X1984 X1988 X1992 X1996
1     1     1     1     1     1
2     2     3     4     4     4
3     4     4     3     2     1

There are of course multiple variations of this dataset that also follow my 3 laws.

How should I write a loop that allows me to generate sim = 1000 law abiding iterations of this dataset?

and how can I be sure that no item (in any database) breaks any of my 3 laws?

Currently trying results <- foreach (i = 1:sim, .combine="df") %dopar% before the loop but have been unsuccessful so far.

Any help or advise will be greatly appreciated.

Upvotes: 1

Views: 53

Answers (1)

F. Priv&#233;
F. Priv&#233;

Reputation: 11728

You can do:

library(foreach)
results <- foreach(i = 1:100) %dopar% {
  foreach(i = 1:nrow(df), .combine = "rbind") %do% {

    lst <- ifelse(df$X1980[i] > df$X1996[i], 
                  list(sort(sample(df$X1980[i]:df$X1996[i],3,replace = T), decreasing = T)),
                  list(sort(sample(df$X1980[i]:df$X1996[i],3,replace = T), decreasing = F)))

    lst <- c(df$X1980[i], unlist(lst), df$X1996[i])

    data.frame(X1980 = lst[1], 
               X1984 = lst[2], 
               X1988 = lst[3], 
               X1992 = lst[4], 
               X1996 = lst[5])
  }
}
do.call("rbind", results)

foreach works like lapply, it puts in a list what you return (the last element) from your expressions.

Upvotes: 0

Related Questions