Reputation: 41
i have almost 13 files and i want to conduct three types of correlations to it. all the files have the same content except the values.
for example:
v1 v2 v3 v4 v5 v6 v7 v8 ........... v50
first correclation between v6 and v20 second correlation between v7 and v21 third correlation between v8 and v22
my data have missing values.
doing it manually for each file will leads to a too long scrip, i want to do a loop function for all the files ( unfortunately im not expert in loop function and i tried so much) I need help please
Upvotes: 2
Views: 85
Reputation: 887118
If 'd1', 'd2', ...'d13' are the datasets and the columns are the in the same order, we can place the dataset in a list
and get the cor
for the specified columns. There are options in ?cor
to compute the covariances in the presence of missing values. Here, I used na.or.complete
. We can change it according to the need.
lapply(mget(paste0('d', 1:13)), function(x)
diag(cor(x[,6:8], x[,20:22], use='na.or.complete')))
It may be better to read the files into a list
directly than creating individual data.frame
objects in the global environment. Assuming that the files are all in the working directory.
files <- list.files(pattern='file\\d+.txt')#change the pattern as needed
lapply(files, function(x) {
x1 <- read.table(x, header=TRUE)
diag(cor(x1[,6:8], x1[,20:22], use = 'na.or.complete'))})
Upvotes: 2
Reputation: 1363
Here's a brute force version (with data generation included), it'll probably work for your purpose, a little more information about the structure of your data/task could help make this more efficient:
N <- 10
k <- 50
d <- data.frame(matrix(runif(N * k), ncol = k))
sapply(20:k, function(col) cor(d[,col - 14], d[,col]))
Edit: Question has been edited, I'm not sure if this is actually what you're after now.
Upvotes: 1