Reputation: 1
corr <- function(directory, threshold = 0) {
files_full<-list.files(directory, full.names=TRUE)
v<-vector()
for (i in 1:10) {
a <- (read.csv(files_full[i]))
b <- subset(a, (!is.na(a[,2])) & (!is.na(a[,3])))
c <- length(b[ ,4])
if (c > threshold) {
d <- cor(b[ ,2],b[ ,3])
} else {
d <- vector(mode="numeric", length = 0)
}
v <- rbind(v, d)
}
v
}
cr <- corr("specdata", 0)
I have a set of .csv files in a directory and want to pass them as an argument to the function above. For each file, I want to select the number of complete cases and, provided that number is greater than a threshold value set via the second function argument, I want to generate the correlation between the values held in two columns of the file (cols 2 and 3). The ultimate aim is a vector containing the value of the correlation for each file for which the threshold condition is met. If the threshold condition isn't met, I want to return a numeric vector of length 0.
The number of complete cases in the first file is 117. The function above works fine so long as the threshold is below this number. If I set the threshold at >=117 the function returns a vector of length 0. And I get the warning
In rbind(v, d) :
number of columns of result is not a multiple of vector length (arg 2)
It seems like the condition in the if statement is getting stuck on the value of the number of complete cases in the first file, rather than looping through.
I'd be very grateful if someone could explain where I'm going wrong!
Upvotes: 0
Views: 56
Reputation: 3447
rbind
is used to bind the rows of vectors or matrixes. If the threshold >= 117 the d
is a vector of length zero. Row-binding two vectors of length zero gives a matrix of 2 rows and 0 columns (see e.g. dim(rbind(vector(), vector()))
). Combining this zero-column matrix with a non-zero length vector is tricky. That is what the warning says.
A better way to achieve your goal is applying a function that computes the correlation for each of the files. Instead of returning a zero-length vector you could use NA
.
correlation_of_large_file <- function(file, threshold = 0) {
df <- read.csv(file)
if (nrow(df) > threshold)
cor(df[, 2], df[, 3])
else
NA
}
files_full <- list.files("specdata", full.names = TRUE)
vapply(files_full, correlation_of_large_file, numeric(1), threshold = 117)
Upvotes: 2