applying a function for a list of dataframes

Question

I have the following data:

 seed(1) 
 X <- data.frame(matrix(rnorm(2000), nrow=10))#### the dataset

The following code creates 1000 bootstrapped datasets "x" and 1000 bootstrapped datasets "y" with 5 columns each.

 colnums_boot <- replicate(1000,sample.int(200,10))
 output<-lapply(1:1000, function(i){
 Xprime <- X[,colnums_boot[1:5,i]]
 Yprime <- X[,colnums_boot[6:10,i]]
 xy <- list(x=Xprime,y=Yprime )
 } )

I obtained a list of lists of dataframes " xy " to which I would like to apply this particular code but do not understand the list indexing operations.

From the output "xy"

Considering the first list [1] which has

$x and

$y

I would like to apply the code:

 X= cor($x) 
 Y= cor($y) separately and then 
 sapply(1:10, function(row) cor(X[row,], Y[row,]))

which will give me a single value for each row "r1" for list [1].

I would like to apply this to the entire list and obtain r1, r2 from list[1] , list[2] respectively and so on.. until 1000 and make it as a dataframe in the end. It will be a ten by thousand dimension dataframe in the end.

Frank · Accepted Answer

I can't find the question where I wrote that Xprime, Yprime bit; I hope you didn't delete it...? If I remember correctly, I suggested this, since it is much more efficient to deal with matrices:

Z <- as.matrix(X)
Xprime2 <- array(,dim=c(10,5,1000))
Yprime2 <- array(,dim=c(10,5,1000))
Xprime2[] <- Z[,colnums_boot[1:5,]]
Yprime2[] <- Z[,colnums_boot[6:10,]]

Anyway, in your setup, as @KarlForner commented, this will get you correlations between X and Y columns

lapply(output,function(ll) cor(ll$x,ll$y))

This is also potentially inefficient when bootstrapping, since you will be computing correlations among the same 200 vectors. I think it makes more sense to just compute them up front cor(X) and then grab the values from there...

As far as putting that into a data.frame, I'm not clear on what that would mean.

applying a function for a list of dataframes

Answers (1)

Related Questions