Reputation: 263
I have the following data:
seed(1)
X <- data.frame(matrix(rnorm(2000), nrow=10))#### the dataset
The following code creates 1000 bootstrapped datasets "x" and 1000 bootstrapped datasets "y" with 5 columns each.
colnums_boot <- replicate(1000,sample.int(200,10))
output<-lapply(1:1000, function(i){
Xprime <- X[,colnums_boot[1:5,i]]
Yprime <- X[,colnums_boot[6:10,i]]
xy <- list(x=Xprime,y=Yprime )
} )
I obtained a list of lists of dataframes " xy " to which I would like to apply this particular code but do not understand the list indexing operations.
From the output "xy"
Considering the first list [1] which has
$x and
$y
I would like to apply the code:
X= cor($x)
Y= cor($y) separately and then
sapply(1:10, function(row) cor(X[row,], Y[row,]))
which will give me a single value for each row "r1" for list [1].
I would like to apply this to the entire list and obtain r1, r2 from list[1] , list[2] respectively and so on.. until 1000 and make it as a dataframe in the end. It will be a ten by thousand dimension dataframe in the end.
Upvotes: 0
Views: 404
Reputation: 66819
I can't find the question where I wrote that Xprime, Yprime bit; I hope you didn't delete it...? If I remember correctly, I suggested this, since it is much more efficient to deal with matrices:
Z <- as.matrix(X)
Xprime2 <- array(,dim=c(10,5,1000))
Yprime2 <- array(,dim=c(10,5,1000))
Xprime2[] <- Z[,colnums_boot[1:5,]]
Yprime2[] <- Z[,colnums_boot[6:10,]]
Anyway, in your setup, as @KarlForner commented, this will get you correlations between X and Y columns
lapply(output,function(ll) cor(ll$x,ll$y))
This is also potentially inefficient when bootstrapping, since you will be computing correlations among the same 200 vectors. I think it makes more sense to just compute them up front cor(X)
and then grab the values from there...
As far as putting that into a data.frame, I'm not clear on what that would mean.
Upvotes: 1