Reputation: 31
I have some files with YYYYMMDD
date code in it. For example, my20150112.csv. How do you make a for-loop in R so that R will automatically process the next date after it finishes processing the previous date.
Here are the scripts below:
R_script -> function(file){
read.csv(file)
}
For example, how would you make the script run R_script(my20150112.csv)
automatically after it run R_script(my20150111.csv)
?
Thanks
Upvotes: 1
Views: 99
Reputation: 6759
Here is an end to end solution. The following function validates the file name syntax and also whether it is a valid date or not. It includes range date filter. See the comments inside the function for more information.
processYYYYMMDDFiles <- function(path = getwd(), start, end) {
path <- normalizePath(path)
# The regular expression for a file
regEx = "^my\\d{8}\\.csv$"
# A more precise regular expression for a valid YYYYMMDD
# ^my(19|20)\\d\\d(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])\\.csv$
# Finding all possible files in the path folder
listFiles = dir(path = path, pattern=regEx)
# Selecting just files with valid dates (not just the format, also a valid one, for
# example is 20170229 is not a valid one, but it is valid syntactically)
# Extracting the date information: YYYYMMMDD
datesOfFiles <- substring(listFiles, 3, 10)
# Internal function for validating a date
checkDate <- function(YYYMMDDdate) {
return(!is.na(as.Date(as.character(YYYMMDDdate),
tz = 'UTC', format = '%Y%m%d')))
}
# Checking
datesOfFilesCheck <- sapply(datesOfFiles, checkDate)
# Reporting about not valid dates
listFilesNOK = listFiles[datesOfFilesCheck == FALSE]
if (length(listFilesNOK) > 0) {
msg = paste("From folder: '%s' skiping the following files,",
"because they have not a valid date:'%s'")
msg = sprintf(msg, path, toString(listFilesNOK))
message(msg)
}
# Filtering for only valid date within the interval [start, end]
validIdx <- (datesOfFilesCheck == TRUE) &
(datesOfFiles >= start) & (datesOfFiles <= end)
listFiles <- listFiles[validIdx]
listFiles <- listFiles[order(listFiles)] # sorting
nFiles = length(listFiles)
print(sprintf("Processing files from folder: '%s'", path))
for (i in 1:nFiles) {
iFile = listFiles[i]
# Here comes the additional tasks for this function
print(sprintf("Processing file: '%s'", iFile))
}
}
Now testing the function creating temporary files in a temp directory:
# Testing
files <- c("my20170101.csv", "my20170110.csv", "my20170215.csv", "my20170229.csv",
"my20170315.csv", "my20170820.csv")
tmpDir <- tempdir()
file.create(file.path(tmpDir, files), overwrite=T)
processYYYYMMDDFiles(path = tmpDir, start="20170101", end="20170330")
print("Removing the testing files...")
print(file.remove(file.path(tmpDir, files)))
It produces the following output:
> source("~/R-workspace/projects/samples/samples/processYYYYMMDDFiles.R", encoding = "Windows-1252")
From folder: 'C:\Users\dleal\AppData\Local\Temp\RtmpoZLcNS' skiping the following files, because they have not a valid date:'my20170229.csv'
[1] "Processing files from folder: 'C:\\Users\\dleal\\AppData\\Local\\Temp\\RtmpoZLcNS'"
[1] "Processing file: 'my20170101.csv'"
[1] "Processing file: 'my20170110.csv'"
[1] "Processing file: 'my20170215.csv'"
[1] "Processing file: 'my20170315.csv'"
[1] "Removing the testing files..."
[1] TRUE TRUE TRUE TRUE TRUE TRUE
I hope this would help
Upvotes: 0
Reputation: 32558
Here's an approach
files = dir(pattern =".csv") #Obtain the names of all files
file_dates = gsub("[^0-9]", "", files) #Obtain the numeric value in each file
require(anytime) #We'll use anytime package
file_dates = anydate(file_dates) #Convert the numeric values to dates
files = files[order(file_dates)] #Order the files according to dates
for (i in 1:length(files)){ #Run your operations
df = read.csv(file = files[i])
#YOUR CODE
}
Upvotes: 1
Reputation: 1994
Assuming your individual files all have the same format:
setwd(<directory where files are>)
for (x in list.files()) {
file <- read.table(i, header=TRUE) # Not sure if you have headers or not
assign(x=as.character(x), value=file, envir=.GlobalEnv)
}
Upvotes: 0