Tatyana Pinyayev
Tatyana Pinyayev

Reputation: 1

function with FOR and IF loops

I am writing a function that will go through a list of files in a directory, count number of complete cases, and if the sum of complete cases is above a given threshhold, a correlation should be calculated. The output must be a numeric vector of correlations for all files that meet the threshhold requirement. This is what I have so far (and it gives me an Error: unexpected '}' in "}" Full disclosure - I am a complete newbie, as in wrote my first code 2 weeks ago. What am I doing wrong?

correlation <- function (directory, threshhold = 0) {
    all_files <- list.files(path = getwd())
    correlations_list <- numeric()
            for (i in seq_along(all_files)) {
                    dataFR2 <- read.csv(all_files[i])
                    c <- c(sum(complete.cases(dataFR2)))
            if c >= threshhold {
            d <- cor(dataFR2$sulfate, dataFR2$nitrate, use = "complete.obs", method = c("pearson"))
            correlations_list <- c(correlations_list, d)
            }
    }
    correlations_list
}

Upvotes: 0

Views: 23

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145775

"Unexpected *" errors are a syntax error. Often a missing parenthesis, comma, or curly bracket. In this case, you need to change if c >= threshhold { to if (c >= threshhold) {. if() is a function and it requires parentheses.

I'd also strongly recommend that you not use c as a variable name. c() is the most commonly used R function, and giving an object the same name will make your code look very strange to anyone else reading it.

Lastly, I'd recommend that you make your output the same length as the the number of files. As you have it, there won't be any way to know which files met the threshold to have their correlations calculated. I'd make correlations_list have the same length as the number of files, and add names to it so you know which correlation belongs to which file. This has the side benefit of not "growing an object in a loop", which is an anti-pattern known for its inefficiency. A rewritten function would look something like this:

correlation <- function (directory, threshhold = 0) {
    all_files <- list.files(path = getwd())
    correlations_list <- numeric(length(all_files)) ## initialize to full length
        for (i in seq_along(all_files)) {
             dataFR2 <- read.csv(all_files[i])
             n_complete <- sum(complete.cases(dataFR2))
             if(n_complete >= threshhold) {
                 d <- cor(dataFR2$sulfate, dataFR2$nitrate, use = "complete.obs", method = c("pearson"))
            } else {
                d <- NA
            }
            correlations_list[i] <- d
        }
    names(correlations_list) <- all_files
    correlations_list
}

Upvotes: 1

Related Questions