Reputation: 13
I have a file in the format below (Note: ignore the blank line between each row. I inserted it so it is formatted properly for the post)
MSFT Equity, 2014-03-03, 23.5, 42.3, 2014-03-04, 23.6, 42.5, ..........
INTC Equity, 2014-03-03, 23.5, 42.3, 2014-03-04, 23.6, 42.5, ..........
AAPL Equity, 2014-03-03, 23.5, 42.3, 2014-03-04, 23.6, 42.5, .......... . . .
In each row, the first column represents ticker. Then it has date followed by two data points repeating itself for many dates. The row length can be very long running in thousands depending on the start date. The number of rows can run in hundreds depends on the number of tickers.
Ideally, I would like to create a data frame for each row with date as index and two data points as two columns. Then I can have a list of data frames with ticker as the index for the list. Alternatively, instead of a data frame, two time series objects for each of the two data points for each row would also work.
Any help in implementing this efficiently would be greatly appreciated. The pseudo code of my implementation is probably not efficient, which can bog down the computer when dealing with very large files.
1) rf <- read.csv(infl, header = FALSE, sep = ",", quote = "",
dec = ".", fill = FALSE, comment.char = "")
2) Read each row and each comma seperated data
3) Then create xts object for each date and data point pair and use rbind to build the full timeseries for each row and for each of the two data points.
e.g. datapt1 <- xts(price1,dt) (e.g.2014-03-03, 23.5)
datapt2 <- xts(price2,dt) (e.g. 2014-03-03, 42.3)
fullxts1 <- rbind(fullxts1,datapt1) #
fullxts2 <- rbind(fullxts2,datapt2)
4) create a dataframe for each row by utilizing two xts object created above.
5) create a list object with ticker as index and dataframe as the element.
I hope my description of the above problem is clear. Thanks in advance for the help
Upvotes: 1
Views: 239
Reputation: 206167
I can't tell if you want xts
objects in the end or not. But here's one way to possibly read the data.
dd<-read.table(text="MSFT Equity, 2014-03-03, 23.5, 42.3, 2014-03-04, 23.6, 42.5
INTC Equity, 2014-03-03, 23.5, 42.3, 2014-03-04, 23.6, 42.5
AAPL Equity, 2014-03-03, 23.5, 42.3, 2014-03-04, 23.6, 42.5", sep=",")
ticker<-rep(dd[,1], (ncol(dd)-1)/3)
datevals<-as.Date(unlist(dd[, seq(2, ncol(dd), by=3)]))
var1vals<-unlist(dd[, seq(3, ncol(dd), by=3)])
var2vals<-unlist(dd[, seq(3, ncol(dd), by=3)])
dd<-data.frame(ticker, datevals, var1vals, var2vals)
xtslist <- by(dd, dd$ticker, function(x) xts(x[,3:4], order.by=x[,2]))
Here xtslist
is a list (of class "by" so it prints a little funny) with each element corresponding to an xts object for each of the rows.
Upvotes: 0