Reputation: 11
I have multiple (~ 80) .csv's in a common folder that I need to subset certain columns and write the subset-ed new table into a new .csv. I have a script that I can use to do them one at a time:
setwd("C:/Data/HAAS")
### ENTER COUNTY NAME
county.name <- "charlotte"
####ENTER UNZIPPED CSV DOWNLOADED FROM ftp://sdrftp03.dor.state.fl.us/Tax%20Roll%20Data%20Files/2016%20Final%20NAL%20-%20SDF%20Files/
cnty.raw<- read.csv("C:/Data/HAAS/tax_roll_data/NAL18F201601.csv")
cnty.clean <- "" ## CREATE NEW EMPTY DATAFRAME
cnty.clean$CNTY_ID <- cnty.raw$CO_NO ## ADD COUNTY ID TO CLEAN FILE
cnty.clean$PARCEL_ID <- cnty.raw$PARCEL_ID ## ADD PARCEL ID TO CLEAN FILE
cnty.clean$ACT_BUILT <- cnty.raw$ACT_YR_BLT ## ADD ACTUAL YEAR BUILT TO CLEAN FILE
cnty.clean$FLUCCS <- cnty.raw$DOR_UC ## ADD LAND USE COLUMN
cnty.clean$HMSTD_VAL <- cnty.raw$AV_HMSTD ## USED TO DETERMINE IF PARCEL CONTAINS A HOMESTEAD
cnty.clean <- cnty.clean[cnty.clean$FLUCCS == 001, ] ## SELECT ONLY SINGLE-FAMILY RESIDENTIAL PROPERTIES
cnty.clean <- na.omit(cnty.clean) ## REMOVE NA VALUES
head(cnty.clean)
#### BEGIN WRITING CLEANED TABLES TO OUTPUT FOLDERS ##################################
setwd("C:/Data/HAAS/R_output/csv") ## CSV DESTINATION FOLDER
write.csv(cnty.clean, file = paste(county.name, ".csv", sep = "")) ## WRITE CSV TO FOLDER
setwd("C:/Data/HAAS/R_output/dbf") ## DBF DESTINATION FOLDER
library(foreign)
write.dbf(cnty.clean, file = paste(county.name, ".dbf", sep = "")) ## WRITE AS .DBF FOR SHAPEFILE MERGE
#### RESET WORKING DIRECTORY BACK TO ORIGINAL PATH
setwd("C:/Data/HAAS")
head(cnty.clean)
summary(cnty.clean)
the problem is that I have to manualy change the csv file name and also the county name field manually everytime I want to run the code through all of the csv's.
I have used r to do simple tasks, but am an amateur when it comes to automating processes using loops. Is there a way I can insert this code into a loop to run over and over again through all the csv's in the filepath's folder?
NOTE: I have searched around and tried different options in other posts, but none seem to work for my situation.
Upvotes: 0
Views: 3907
Reputation: 2280
Assuming all of your csv files have the same column names you can do this:
library(foreign)
setwd("C:/Data/HAAS")
my_func <- function(path, county.name){
cnty.raw <- read.csv(path)
cnty.clean <- data.frame()
cnty.clean$CNTY_ID <- cnty.raw$CO_NO ## ADD COUNTY ID TO CLEAN FILE
cnty.clean$PARCEL_ID <- cnty.raw$PARCEL_ID ## ADD PARCEL ID TO CLEAN FILE
cnty.clean$ACT_BUILT <- cnty.raw$ACT_YR_BLT ## ADD ACTUAL YEAR BUILT TO CLEAN FILE
cnty.clean$FLUCCS <- cnty.raw$DOR_UC ## ADD LAND USE COLUMN
cnty.clean$HMSTD_VAL <- cnty.raw$AV_HMSTD ## USED TO DETERMINE IF PARCEL CONTAINS A HOMESTEAD
cnty.clean <- cnty.clean[cnty.clean$FLUCCS == 001, ] ## SELECT ONLY SINGLE-FAMILY RESIDENTIAL PROPERTIES
cnty.clean <- na.omit(cnty.clean) ## REMOVE NA VALUES
write.csv(cnty.clean, file = paste("R_output/csv/", county.name, ".csv", sep = ""))
write.csv(cnty.clean, file = paste("R_output/dbf/", county.name, ".csv", sep = ""))
}
list_path <- c("tax_roll_data/NAL18F201601.csv", "path/to/second/file.csv")
list_county <- c("charlotte", "second county")
mapply(my_func, path = list_path, county.name = list_county)
NOTE: This changes your file paths to relative paths so you can leave out the "C:/Data/HAAS/"
portion in the list of paths.
You can use the list.files
function described in the other answer to build your list of paths but this solution should run much faster than a for loop. This, however, does assume you have an equal number of county names and paths. Otherwise it will reuse values from the shorter of the two lists and give you unexpected (and unwanted) results
Upvotes: 1
Reputation: 1038
You can get all the file names using list.files
. The option full.names = TRUE
gives you the full path of the file name, instead of relational. Here's a simple example that should be extensible.
dfiles <- list.files(directory, full.names = TRUE)
county.names <- #vector of counties
for(j in 1:length(county.names)){
for (i in 1:length(dfiles)){
#operations go here
write.csv(cnty.clean, file = paste(county.names[j], ".csv", sep = "")))
}
}
Upvotes: 3
Reputation: 3768
You can use
f <- list.files('C:/Data/HAAS/tax_roll_data', pattern='.csv')
to get the files in your folder C:/Data/HAAS/tax_roll_data into a vector. Then loop through f
, something like:
for (i in 1:length(f)){
cnty.raw <- read.csv(f[i])
#Rest of your script
}
Upvotes: 0