nzhanggh
nzhanggh

Reputation: 131

Trouble using mutate within a for loop

I'm trying to write a function called complete that takes a file directory (which has csv files titled 1-332) and the title of the file as a number to print out the number of rows without NA in the sulfate or nitrate columns. I am trying to use mutate to add a column titled nobs which returns 1 if neither column is na and then takes the sum of nobs for my answer, but I get an error message that the object nob is not found. How can I fix this? The specific file directory in question is downloaded within this block of code.

library(tidyverse)
if(!file.exists("rprog-data-specdata.zip")) {
  temp <- tempfile()
  download.file("https://d396qusza40orc.cloudfront.net/rprog%2Fdata%2Fspecdata.zip",temp)
  unzip(temp)
  unlink(temp)
}

complete <- function(directory, id = 1:332){
  #create a list of files
  files_full <- list.files(directory, full.names = TRUE)
  #create an empty data frame
  dat <- data.frame()
  for(i in id){
    dat <- rbind(dat, read.csv(files_full[i]))
  }
  mutate(dat, nob = ifelse(!is.na(dat$sulfate) & !is.na(dat$nitrate), 1, 0))
  x <- summarise(dat, sum = sum(nob))

return(x)
}

When one runs the following code nobs should be 117, but I get an error message instead

complete("specdata", 1)

Error: object 'nob' not found"

Upvotes: 0

Views: 285

Answers (2)

Parfait
Parfait

Reputation: 107567

As mentioned, avoid building objects in a loop. Instead, consider building a list of data frames from each csv then call rbind once. In fact, even consider base R (i.e., tinyverse) for all your needs:

complete <- function(directory, id = 1:332){
  # create a list of files
  files_full <- list.files(directory, full.names = TRUE)

  # create a list of data frames
  df_list <- lapply(files_full[id], read.csv)

  # build a single data frame with nob column
  dat <- transform(do.call(rbind, df_list), 
                   nob = ifelse(!is.na(sulfate) & !is.na(nitrate), 1, 0)
         )

  return(sum(dat$nob))
}

Upvotes: 1

Daniel D. Sjoberg
Daniel D. Sjoberg

Reputation: 11650

I think the function below should get what you need. Rather than a loop, I prefer map (or apply) in this setting. It's difficult to say where your code went wrong without the error message or an example I can run on my machine, however.

Happy Coding, Daniel

library(tidyverse)
complete <- function(directory, id = 1:332){
  #create a list of files
  files_full <- list.files(directory, full.names = TRUE)

  # cycle over each file to get the number of nonmissing rows
  purrr::map_int(
    files_full,
    ~ read.csv(.x) %>% # read in datafile 
      dplyr::select(sulfate, nitrate) %>% # select two columns of interest
      tidyr::drop_na %>% # drop missing observations
      nrow() # get the number of rows with no missing data
  ) %>%
    sum() # sum the total number of rows not missing among all files
}

Upvotes: 1

Related Questions