Reputation: 35
I have a large set of csv files in a single directory. These files contain two columns, Date
and Price
. The filename
of filename.csv
contains the unique identifier of the data series. I understand that missing values for merged data series can be handled when these times series data are zoo objects. I also understand that, in using the na.locf(merge() function
, I can fill in the missing values with the most recent observations.
I want to automate the process of.
*.csv
file columnar Date and Price data into R dataframes. MergedData <- na.locf(merge( ))
.The ultimate goal, of course, is to use the fPortfolio
package.
I've used the following statement to create a data frame of Date,Price
pairs. The problem with this approach is that I lose the <filename>
identifier of the time series data from the files.
result <- lapply(files, function(x) x <- read.csv(x) )
I understand that I can write code to generate the R statements required to do all these steps instance by instance. I'm wondering if there is some approach that wouldn't require me to do that. It's hard for me to believe that others haven't wanted to perform this same task.
Upvotes: 1
Views: 1451
Reputation: 269346
Try this:
z <- read.zoo(files, header = TRUE, sep = ",")
z <- na.locf(z)
I have assumed a header line and lines like 2000-01-31,23.40
. Use whatever read.zoo
arguments are necessary to accommodate whatever format you have.
Upvotes: 2
Reputation: 121568
You can have better formatting using sapply
( keep the files names). Here I will keep lapply
.
list.files
.
it is very handy for such workflow.read.zoo
to get directly zoo objects(avoid later coercing)For example:
zoo.objs <- lapply(list.files(path=MY_FILES_DIRECTORY,
pattern='^zoo_*.csv', ## I look for csv files,
## which names start with zoo_
full.names=T), ## to get full names path+filename
read.zoo)
I use now list.files
again to rename my result
names(zoo.objs) <- list.files(path=MY_FILES_DIRECTORY,
pattern='^zoo_*.csv')
Upvotes: 1