Sachin D
Sachin D

Reputation: 73

Value of complete.cases in each file inside a folder as a table with columns [id],[complete.cases value]

A folder has dozens of csv files. Each csv file is named with just an id ranging from 1 - 332. Each file contains two columns "sulfate" and "nitrate" with numeric values of pollution level. I want to create a table that lists ids (file names as 'id') in one column, and number of complete cases (as 'nobs') in that file in another column.

Please suggest modification to the code below (or something totally new is fine)

complete <- function(directory, id = 1:332) {
  csvfiles <- dir(directory, "*\\.csv$", full.names = TRUE)
  data <- lapply(csvfiles[id], read.csv)
  for (filedata in data) {
  d <- filedata[["sulfate"]]
  d <- d[complete.cases(d)] # remove NA values
  d1 <- filedata[["nitrate"]]
  d1<- d1[complete.cases(d1)]
  }
  paste(id, (length(d)+length(d1)))
}

Currently the above code just binds the id numbers with the total of complete cases across all the files in that id-range.

Upvotes: 0

Views: 85

Answers (1)

chinsoon12
chinsoon12

Reputation: 25225

some suggested modifications: you can read in and process the csv file within the same function. Use cbind to add the 2 columns that you require. Then row bind all the data.frames into 1 data.frame

complete <- function(directory, id = 1:332) {
  lsData <- lapply(id, function(n) {
    df <- read.csv(paste0(directory, "/", n, ".csv"))
    cbind(id=n, df, nobs=nrow(df[complete.cases(df),,drop=FALSE]))
  })
  do.call(rbind, lsData)
}

Upvotes: 0

Related Questions