Shawn
Shawn

Reputation: 3369

R vector not printing expected output

corr <- function(directory, threshold = 0){

  #get all the cases that have non-NA values
  complete_cases <- complete(directory)
  #get all the observations over the threshold amount
  filter_cases <- complete_cases[complete_cases[["nobs"]] > threshold, ]

  #The returned data frame contains two columns "ID" and "nobs"

  #get all file names in a vector
  all_files <- list.files(directory, full.names=TRUE)

  correlation <- vector("numeric")

  for(i in as.numeric(filter_cases[["ID"]])){
    #get all the files that are in the filter_cases
    output <- read.csv(all_files[i])
    #remove all NA values from the data
    output <- output[complete.cases(output), ]
    #get each of the correlations and store them
    correlation[i] <- cor(output[["nitrate"]], output[["sulfate"]])
  }

  correlation
}

My expected out put from this is something like:

corr("directory", 200)

[1] -1.023 0.0456 0.8231 etc

What I get is:

NA NA -1.023 NA NA
NA NA NA 0.0456 NA
0.8231 NA NA NA NA etc

I feel like this is something simple I am missing as print(cor(output[["nitrate"]], output[["sulfate"]])) basically gets me what I would expect. The output must be a vector as I plan on using the function in other functions.

Upvotes: 0

Views: 228

Answers (1)

Andy McKenzie
Andy McKenzie

Reputation: 456

It seems to me likely that your problem is due to indexing of your for loop. This leads to some entries of the correlation vector being skipped over and therefore being set to NAs. Without access to your data, it is hard to know for sure, but it seems that the purpose of the upper lines is so that you only loop over and access certain files. If this is the case, since you are using the for loop for two purposes, it may make sense to make correlation indexing use an explicit counter, as below.

cor_index = 0 
for(i in as.numeric(filter_cases[["ID"]])){
    #get all the files that are in the filter_cases
    output <- read.csv(all_files[i])
    #remove all NA values from the data
    output <- output[complete.cases(output), ]
    #get each of the correlations and store them
    cor_index = cor_index + 1 
    correlation[cor_index] <- cor(output[["nitrate"]], output[["sulfate"]])
}

Upvotes: 1

Related Questions