Reputation: 2126

Combine dataframes as function output in a single dataframe in R

I would like to combine multiple dataframes, as output of a function, into one big dataframe in R.

I am simulating data within a function, e.g.:

set.seed(123)

x <- function(){
return( data.frame( matrix(rnorm(10, 1, .5), ncol=2) ) )
}

I would like to run multiple simulations and tie the dataframes together.

Attempt

set.seed(123)

x_improved <- function(sim_nr){
  df <- data.frame( matrix(rnorm(10, 1, .5), ncol=2) )  # simulate data
  sim_nr <- rep(sim_nr, length(df[,1])).                # add reference number
  df <- cbind(df, sim_nr)                               # bind columns
  return(df)
}

list_dataframes <- lapply(c(1,2,3), x_improved)         # create list of dataframes

df <- do.call("rbind", list_dataframes)                 # convert list to dataframe

The code above does so, see "Expected output" below.

Expected output:

> df
          X1        X2 sim_nr
1  0.4660881 0.1566533      1
2  0.8910125 1.4188935      1
3  0.4869978 1.0766866      1
4  0.6355544 0.4309315      1
5  0.6874804 1.6269075      1
6  1.2132321 1.3443201      2
7  0.8524643 1.2769588      2
8  1.4475628 0.9690441      2
9  1.4390667 0.8470187      2
10 1.4107905 0.8097645      2
11 0.6526465 0.4384457      3
12 0.8960414 0.7985576      3
13 0.3673018 0.7666723      3
14 2.0844780 1.3899826      3
15 1.6039810 0.9583155      3

Question:

Is this the proper (or R) way to address this problem? Are there more efficient (or convenient) solutions?

Upvotes: 1

Answers (4)

Cole

Reputation: 11255

Another approach would be to use an array which can be more performant if you need to do a lot of grouping operations.

set.seed(123)
replicate(3, matrix(rnorm(10, 1, 0.5), ncol = 2))
, , 1

          [,1]      [,2]
[1,] 0.7197622 1.8575325
[2,] 0.8849113 1.2304581
[3,] 1.7793542 0.3674694
[4,] 1.0352542 0.6565736
[5,] 1.0646439 0.7771690

, , 2

          [,1]       [,2]
[1,] 1.6120409 1.89345657
[2,] 1.1799069 1.24892524
[3,] 1.2003857 0.01669142
[4,] 1.0553414 1.35067795
[5,] 0.7220794 0.76360430

, , 3

          [,1]      [,2]
[1,] 0.4660881 0.1566533
[2,] 0.8910125 1.4188935
[3,] 0.4869978 1.0766866
[4,] 0.6355544 0.4309315
[5,] 0.6874804 1.6269075

Or, if you want a data.frame, it's oftentimes faster to do all of your rnorm simulations at once. Note that even with the seed set that this isn't an exact match - the matrix fills up by the column so the ordering is slightly different.

set.seed(123)
nsim <- 3
data.frame(matrix(rnorm(10 * n_sim, 1, 0.5), ncol = 2),
           sim_nr = rep(seq_len(n_sim), each = 5)
  )

Upvotes: 3

Manuel F.

Reputation: 255

The simplest solution is to use rbindlist from the data.table library:

> library(data.table)
> rbindlist(list_dataframes)

You can of course do it for your list_dataframes either outside or inside of the "for" loop.

Upvotes: 0

A. Suliman

Reputation: 13125

Using purrr library

purrr::map_df(c(1,2,3), ~data.frame(matrix(rnorm(10, 1, .5), ncol=2)), .id='sim_nr') 
#Using the x function it would be 
purrr::map_df(c(1,2,3), ~x() , .id='sim_nr')

Upvotes: 2

Ronak Shah

Reputation: 388817

One way to improve at least by number of lines would be to use transform and the function x_improved becomes one-liner

set.seed(123)
x_improved <- function(sim_nr){
   transform(data.frame(matrix(rnorm(10, 1,.5), ncol=2), sim_nr = sim_nr))
}

do.call(rbind, lapply(1:3, x_improved))


#          X1         X2 sim_nr
#1  0.7197622 1.85753249      1
#2  0.8849113 1.23045810      1
#3  1.7793542 0.36746938      1
#4  1.0352542 0.65657357      1
#5  1.0646439 0.77716901      1
#6  1.6120409 1.89345657      2
#7  1.1799069 1.24892524      2
#8  1.2003857 0.01669142      2
#9  1.0553414 1.35067795      2
#10 0.7220794 0.76360430      2
#11 0.4660881 0.15665334      3
#12 0.8910125 1.41889352      3
#13 0.4869978 1.07668656      3
#14 0.6355544 0.43093153      3
#15 0.6874804 1.62690746      3

Or depending on your use-case you could construct the dataframe all together.

num <- 1:3
transform(data.frame(matrix(rnorm(10 * length(num), 1,.5), ncol=2)), 
          sim_nr = rep(num, each = 10/2))

Upvotes: 2

Combine dataframes as function output in a single dataframe in R

Answers (4)

Related Questions