Reputation: 2126
I would like to combine multiple dataframes, as output of a function, into one big dataframe in R.
I am simulating data within a function, e.g.:
set.seed(123)
x <- function(){
return( data.frame( matrix(rnorm(10, 1, .5), ncol=2) ) )
}
I would like to run multiple simulations and tie the dataframes together.
Attempt
set.seed(123)
x_improved <- function(sim_nr){
df <- data.frame( matrix(rnorm(10, 1, .5), ncol=2) ) # simulate data
sim_nr <- rep(sim_nr, length(df[,1])). # add reference number
df <- cbind(df, sim_nr) # bind columns
return(df)
}
list_dataframes <- lapply(c(1,2,3), x_improved) # create list of dataframes
df <- do.call("rbind", list_dataframes) # convert list to dataframe
The code above does so, see "Expected output" below.
Expected output:
> df
X1 X2 sim_nr
1 0.4660881 0.1566533 1
2 0.8910125 1.4188935 1
3 0.4869978 1.0766866 1
4 0.6355544 0.4309315 1
5 0.6874804 1.6269075 1
6 1.2132321 1.3443201 2
7 0.8524643 1.2769588 2
8 1.4475628 0.9690441 2
9 1.4390667 0.8470187 2
10 1.4107905 0.8097645 2
11 0.6526465 0.4384457 3
12 0.8960414 0.7985576 3
13 0.3673018 0.7666723 3
14 2.0844780 1.3899826 3
15 1.6039810 0.9583155 3
Question:
Is this the proper (or R) way to address this problem? Are there more efficient (or convenient) solutions?
Upvotes: 1
Views: 329
Reputation: 11255
Another approach would be to use an array
which can be more performant if you need to do a lot of grouping operations.
set.seed(123)
replicate(3, matrix(rnorm(10, 1, 0.5), ncol = 2))
, , 1
[,1] [,2]
[1,] 0.7197622 1.8575325
[2,] 0.8849113 1.2304581
[3,] 1.7793542 0.3674694
[4,] 1.0352542 0.6565736
[5,] 1.0646439 0.7771690
, , 2
[,1] [,2]
[1,] 1.6120409 1.89345657
[2,] 1.1799069 1.24892524
[3,] 1.2003857 0.01669142
[4,] 1.0553414 1.35067795
[5,] 0.7220794 0.76360430
, , 3
[,1] [,2]
[1,] 0.4660881 0.1566533
[2,] 0.8910125 1.4188935
[3,] 0.4869978 1.0766866
[4,] 0.6355544 0.4309315
[5,] 0.6874804 1.6269075
Or, if you want a data.frame
, it's oftentimes faster to do all of your rnorm
simulations at once. Note that even with the seed set that this isn't an exact match - the matrix fills up by the column so the ordering is slightly different.
set.seed(123)
nsim <- 3
data.frame(matrix(rnorm(10 * n_sim, 1, 0.5), ncol = 2),
sim_nr = rep(seq_len(n_sim), each = 5)
)
Upvotes: 3
Reputation: 255
The simplest solution is to use rbindlist from the data.table library:
> library(data.table)
> rbindlist(list_dataframes)
You can of course do it for your list_dataframes either outside or inside of the "for" loop.
Upvotes: 0
Reputation: 13125
Using purrr
library
purrr::map_df(c(1,2,3), ~data.frame(matrix(rnorm(10, 1, .5), ncol=2)), .id='sim_nr')
#Using the x function it would be
purrr::map_df(c(1,2,3), ~x() , .id='sim_nr')
Upvotes: 2
Reputation: 388817
One way to improve at least by number of lines would be to use transform
and the function x_improved
becomes one-liner
set.seed(123)
x_improved <- function(sim_nr){
transform(data.frame(matrix(rnorm(10, 1,.5), ncol=2), sim_nr = sim_nr))
}
do.call(rbind, lapply(1:3, x_improved))
# X1 X2 sim_nr
#1 0.7197622 1.85753249 1
#2 0.8849113 1.23045810 1
#3 1.7793542 0.36746938 1
#4 1.0352542 0.65657357 1
#5 1.0646439 0.77716901 1
#6 1.6120409 1.89345657 2
#7 1.1799069 1.24892524 2
#8 1.2003857 0.01669142 2
#9 1.0553414 1.35067795 2
#10 0.7220794 0.76360430 2
#11 0.4660881 0.15665334 3
#12 0.8910125 1.41889352 3
#13 0.4869978 1.07668656 3
#14 0.6355544 0.43093153 3
#15 0.6874804 1.62690746 3
Or depending on your use-case you could construct the dataframe all together.
num <- 1:3
transform(data.frame(matrix(rnorm(10 * length(num), 1,.5), ncol=2)),
sim_nr = rep(num, each = 10/2))
Upvotes: 2