erver
erver

Reputation: 31

Reading files in For-loop fashion in R

I have some files with YYYYMMDD date code in it. For example, my20150112.csv. How do you make a for-loop in R so that R will automatically process the next date after it finishes processing the previous date. Here are the scripts below:

R_script -> function(file){
   read.csv(file)
}

For example, how would you make the script run R_script(my20150112.csv) automatically after it run R_script(my20150111.csv)?

Thanks

Upvotes: 1

Views: 99

Answers (3)

David Leal
David Leal

Reputation: 6759

Here is an end to end solution. The following function validates the file name syntax and also whether it is a valid date or not. It includes range date filter. See the comments inside the function for more information.

processYYYYMMDDFiles <- function(path = getwd(), start, end) {
    path <- normalizePath(path)
    # The regular expression for a file
    regEx = "^my\\d{8}\\.csv$"
    # A more precise regular expression for a valid YYYYMMDD
    # ^my(19|20)\\d\\d(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])\\.csv$

    # Finding all possible files in the path folder
    listFiles = dir(path = path, pattern=regEx)

    # Selecting just files with valid dates (not just the format, also a valid one, for 
    # example is 20170229 is not a valid one, but it is valid syntactically) 
    # Extracting the date information: YYYYMMMDD
    datesOfFiles <- substring(listFiles, 3, 10)

    # Internal function for validating a date
    checkDate <- function(YYYMMDDdate) {
        return(!is.na(as.Date(as.character(YYYMMDDdate),
        tz = 'UTC', format = '%Y%m%d')))
    }

    # Checking
    datesOfFilesCheck <- sapply(datesOfFiles, checkDate)

    # Reporting about not valid dates
    listFilesNOK = listFiles[datesOfFilesCheck == FALSE]
    if (length(listFilesNOK) > 0) {
        msg = paste("From folder: '%s' skiping the following files,",
                "because they have not a valid date:'%s'")
        msg = sprintf(msg, path, toString(listFilesNOK))
        message(msg)
    }

    # Filtering for only valid date within the interval [start, end]
    validIdx <- (datesOfFilesCheck == TRUE) & 
        (datesOfFiles >= start) & (datesOfFiles <= end)
    listFiles <- listFiles[validIdx]
    listFiles <- listFiles[order(listFiles)] # sorting
    nFiles = length(listFiles)

    print(sprintf("Processing files from folder: '%s'", path))
    for (i in 1:nFiles) {
        iFile = listFiles[i]
        # Here comes the additional tasks for this function
        print(sprintf("Processing file: '%s'", iFile))
    }
}

Now testing the function creating temporary files in a temp directory:

# Testing
files <- c("my20170101.csv", "my20170110.csv", "my20170215.csv", "my20170229.csv", 
   "my20170315.csv", "my20170820.csv")
tmpDir <- tempdir()
file.create(file.path(tmpDir, files), overwrite=T)

processYYYYMMDDFiles(path = tmpDir, start="20170101", end="20170330")
print("Removing the testing files...")
print(file.remove(file.path(tmpDir, files)))

It produces the following output:

> source("~/R-workspace/projects/samples/samples/processYYYYMMDDFiles.R", encoding = "Windows-1252")
From folder: 'C:\Users\dleal\AppData\Local\Temp\RtmpoZLcNS' skiping the following files, because they have not a valid date:'my20170229.csv'
[1] "Processing files from folder: 'C:\\Users\\dleal\\AppData\\Local\\Temp\\RtmpoZLcNS'"
[1] "Processing file: 'my20170101.csv'"
[1] "Processing file: 'my20170110.csv'"
[1] "Processing file: 'my20170215.csv'"
[1] "Processing file: 'my20170315.csv'"
[1] "Removing the testing files..."
[1] TRUE TRUE TRUE TRUE TRUE TRUE

I hope this would help

Upvotes: 0

d.b
d.b

Reputation: 32558

Here's an approach

files = dir(pattern =".csv") #Obtain the names of all files
file_dates = gsub("[^0-9]", "", files) #Obtain the numeric value in each file
require(anytime) #We'll use anytime package
file_dates = anydate(file_dates) #Convert the numeric values to dates
files = files[order(file_dates)] #Order the files according to dates

for (i in 1:length(files)){ #Run your operations
    df = read.csv(file = files[i]) 
    #YOUR CODE
}

Upvotes: 1

David C.
David C.

Reputation: 1994

Assuming your individual files all have the same format:

setwd(<directory where files are>)
for (x in list.files()) {
  file <- read.table(i, header=TRUE)  # Not sure if you have headers or not
  assign(x=as.character(x), value=file, envir=.GlobalEnv)
}

Upvotes: 0

Related Questions