megashigger
megashigger

Reputation: 9053

Undefined column selected in R

The error I got was "Error in '[.data.frame'(current_dataset, complete.cases(current_dataset)) :undefined columns selected". I tried to find the problem but can't figure it out.

What I want the function to do: First, it is goes through several files that contain sulfate and nitrate information for different locations. These files all contain 'csv' so myfiles will be used as a vector to easily refer to files. Then I want to loop through the 332 files, read it, and check if there are enough complete cases (this number is an argument in the function). If that's the case, I want to add all complete cases (sulfate and nitrate data) to a data frame that was defined previously. Finally I want to return the correlation between sulfate and nitrate.

corr <- function(directory, threshold = 0) {
    #store data frame that holds sulfate amount and nitrate amount that meet threshold and are complete cases
    data <- data.frame(sulfate = numeric(0), nitrate = numeric(0))

    #set working directory
    setwd(directory)

    #get file names
    myfiles <- list.files(pattern = "csv")

    #loop through files
    for(i in 1:332) {

        #read each file
        current_dataset <- read.csv(myfiles[i])

        #check if there are enough compelte cases to meet threshold
        if(sum(complete.cases(current_dataset)) > threshold) {

            #get complete cases
            complete_cases <- current_dataset[complete.cases(current_dataset)]

            #add sulfate and nitrate info to table
            data <- rbind(data, data.frame(sulfate = complete_cases$sulfate[i], nitrate = complete_cases$nitrate)[i])
        }
    }
    #get correlation
    cor(data)
}

Upvotes: 0

Views: 19481

Answers (1)

Matthew Lundberg
Matthew Lundberg

Reputation: 42649

Here is the error:

complete_cases <- current_dataset[complete.cases(current_dataset)]

Should be:

complete_cases <- current_dataset[complete.cases(current_dataset), ]

A single argument to [ is taken as a set of columns to select. You include a comma and omit the column selection to select rows.

Upvotes: 5

Related Questions