Michael
Michael

Reputation: 11

Using a loop to read, clean, and write, multiple .csv's in r

I have multiple (~ 80) .csv's in a common folder that I need to subset certain columns and write the subset-ed new table into a new .csv. I have a script that I can use to do them one at a time:

setwd("C:/Data/HAAS")

### ENTER COUNTY NAME

county.name <- "charlotte"


####ENTER UNZIPPED CSV DOWNLOADED FROM ftp://sdrftp03.dor.state.fl.us/Tax%20Roll%20Data%20Files/2016%20Final%20NAL%20-%20SDF%20Files/


cnty.raw<- read.csv("C:/Data/HAAS/tax_roll_data/NAL18F201601.csv")
cnty.clean <- ""                                    ## CREATE NEW EMPTY DATAFRAME


cnty.clean$CNTY_ID <- cnty.raw$CO_NO                ## ADD COUNTY ID TO CLEAN FILE
cnty.clean$PARCEL_ID <- cnty.raw$PARCEL_ID          ## ADD PARCEL ID TO CLEAN FILE 
cnty.clean$ACT_BUILT <- cnty.raw$ACT_YR_BLT         ## ADD ACTUAL YEAR BUILT TO CLEAN FILE
cnty.clean$FLUCCS <- cnty.raw$DOR_UC                ## ADD LAND USE COLUMN
cnty.clean$HMSTD_VAL <- cnty.raw$AV_HMSTD           ## USED TO DETERMINE IF PARCEL CONTAINS A HOMESTEAD

cnty.clean <- cnty.clean[cnty.clean$FLUCCS == 001, ]   ## SELECT ONLY SINGLE-FAMILY RESIDENTIAL PROPERTIES 
cnty.clean <- na.omit(cnty.clean)                      ## REMOVE NA VALUES
head(cnty.clean)


#### BEGIN WRITING CLEANED TABLES TO OUTPUT FOLDERS ##################################

setwd("C:/Data/HAAS/R_output/csv")                    ## CSV DESTINATION FOLDER


write.csv(cnty.clean, file = paste(county.name, ".csv", sep = ""))    ## WRITE CSV TO FOLDER

setwd("C:/Data/HAAS/R_output/dbf")                                    ## DBF DESTINATION FOLDER
library(foreign)

write.dbf(cnty.clean, file = paste(county.name, ".dbf", sep = ""))    ## WRITE AS .DBF FOR SHAPEFILE MERGE

#### RESET WORKING DIRECTORY BACK TO ORIGINAL PATH

setwd("C:/Data/HAAS")

head(cnty.clean)
summary(cnty.clean)

the problem is that I have to manualy change the csv file name and also the county name field manually everytime I want to run the code through all of the csv's.

I have used r to do simple tasks, but am an amateur when it comes to automating processes using loops. Is there a way I can insert this code into a loop to run over and over again through all the csv's in the filepath's folder?

NOTE: I have searched around and tried different options in other posts, but none seem to work for my situation.

Upvotes: 0

Views: 3907

Answers (3)

tbradley
tbradley

Reputation: 2280

Assuming all of your csv files have the same column names you can do this:

library(foreign)
setwd("C:/Data/HAAS")
my_func <- function(path, county.name){
  cnty.raw <- read.csv(path)
  cnty.clean <- data.frame()

  cnty.clean$CNTY_ID <- cnty.raw$CO_NO                ## ADD COUNTY ID TO CLEAN FILE
  cnty.clean$PARCEL_ID <- cnty.raw$PARCEL_ID          ## ADD PARCEL ID TO CLEAN FILE 
  cnty.clean$ACT_BUILT <- cnty.raw$ACT_YR_BLT         ## ADD ACTUAL YEAR BUILT TO CLEAN FILE
  cnty.clean$FLUCCS <- cnty.raw$DOR_UC                ## ADD LAND USE COLUMN
  cnty.clean$HMSTD_VAL <- cnty.raw$AV_HMSTD           ## USED TO DETERMINE IF PARCEL CONTAINS A HOMESTEAD

  cnty.clean <- cnty.clean[cnty.clean$FLUCCS == 001, ]   ## SELECT ONLY SINGLE-FAMILY RESIDENTIAL PROPERTIES 
  cnty.clean <- na.omit(cnty.clean)                      ## REMOVE NA VALUES

  write.csv(cnty.clean, file = paste("R_output/csv/", county.name, ".csv", sep = ""))
  write.csv(cnty.clean, file = paste("R_output/dbf/", county.name, ".csv", sep = ""))

}

list_path <- c("tax_roll_data/NAL18F201601.csv", "path/to/second/file.csv")
list_county <- c("charlotte", "second county")

mapply(my_func, path = list_path, county.name = list_county)

NOTE: This changes your file paths to relative paths so you can leave out the "C:/Data/HAAS/" portion in the list of paths.

You can use the list.files function described in the other answer to build your list of paths but this solution should run much faster than a for loop. This, however, does assume you have an equal number of county names and paths. Otherwise it will reuse values from the shorter of the two lists and give you unexpected (and unwanted) results

Upvotes: 1

Lloyd Christmas
Lloyd Christmas

Reputation: 1038

You can get all the file names using list.files. The option full.names = TRUE gives you the full path of the file name, instead of relational. Here's a simple example that should be extensible.

dfiles <- list.files(directory, full.names = TRUE)

county.names <- #vector of counties

for(j in 1:length(county.names)){
   for (i in 1:length(dfiles)){
      #operations go here

       write.csv(cnty.clean, file = paste(county.names[j], ".csv", sep = "")))
   }
}

Upvotes: 3

user3375672
user3375672

Reputation: 3768

You can use f <- list.files('C:/Data/HAAS/tax_roll_data', pattern='.csv') to get the files in your folder C:/Data/HAAS/tax_roll_data into a vector. Then loop through f, something like:

for (i in 1:length(f)){ 
cnty.raw <- read.csv(f[i])

#Rest of your script
}

Upvotes: 0

Related Questions