Vinayak Bakshi
Vinayak Bakshi

Reputation: 125

R code debugging and error correction understanding

I have this code iv written for counting that reads a directory full of files and reports the number of completely observed cases in each data file. The function should return a data frame where the first column is the name of the file and the second column is the number of complete cases. I need help in the error in this code which is:

Error in [.data.frame(data, i) : undefined columns selected
In addition: Warning messages:
1: In comp[i] <- !is.na(data[i]) : number of items to replace is not a multiple of replacement length
2: In comp[i] <- !is.na(data[i]) : number of items to replace is not a multiple of replacement length
3: In comp[i] <- !is.na(data[i]) : number of items to replace is not a multiple of replacement length

The code is the following:

complete<-function(directory, id=1:332){
        files.list<-list.files(directory, full.names=TRUE, pattern=".csv") 
        comp<-character()
        return.data<-data.frame()
        nobs<-numeric()

        for(i in id){
         data<-read.csv(files.list[i])
            comp[i]<-!is.na(data[i])
            nobs[i]<-nrow(comp[i])

    }
        return.data<-c(id,nobs)
    }

Upvotes: 0

Views: 124

Answers (2)

wici
wici

Reputation: 1711

Your problem is, that !is.na() returns a boolean vector and not a single value, you cannot insert multiple elements into the single element comp[i].

In R there is a function complete.cases which does exactly what you attempted. With this your function would look like this

complete<-function(directory, id=1:332){
  files.list<-list.files(directory, full.names=TRUE, pattern=".csv") 
  nobs <- numeric(length(id))
  for(i in id){
    data<-read.csv(files.list[i])
    nobs[i]<-sum(complete.cases(data))
  }
  return.data<-data.frame(id,nobs)
}

That aside your code has several flaws I want to point out

  • why is comp of type character?
  • allocate the size of a vector if you know it beforehand (nobs <- numeric(length(id)))
  • do you really want to check only columni of your ith loaded data.frame` for missing values?
  • if you assign return.data <- c(id,nobs) return.data will be a single numeric vector with ids at the beginning and nobs at the end.

Upvotes: 2

RomRom
RomRom

Reputation: 322

you need to provide an index to your data.. so that it selects all rows and i column.e.g comp[i]<-!is.na(data[ ,i])

Upvotes: 0

Related Questions