Reputation: 942
I am trying to reclass the following xts series since columns 1-8 are character and are suppose to be numeric, columns 9-10 are character as they are suppose to be.....
# data
x <- structure(c(NA, NA, "41.95", "30.55", "29.05", "23.71", NA, "23.80",
NA, NA, "18.67", NA, "16.90", "17.10", "14.90", "13.64", "12.70",
"11.65", "10.75", " 9.75", " 9.05", " 7.95", " 6.70", " 6.02",
" 5.05", NA, NA, " 0.00", " 0.00", " 0.28", "-0.29", NA, " 0.00",
NA, NA, "-1.28", NA, "-1.10", " 0.00", "-0.30", "-1.51", "-1.50",
"-2.66", "-1.50", "-1.45", "-1.15", "-0.45", "-0.75", "-1.38",
"-0.45", "48.20", "43.20", "38.20", "33.20", "28.25", "23.30",
"22.25", "21.25", "20.30", "19.35", "18.35", "17.40", "16.35",
"15.50", "14.50", "13.55", "12.55", "11.55", "10.60", " 9.65",
" 8.65", " 7.70", " 6.80", " 5.90", " 5.00", "48.80", "43.80",
"38.80", "33.80", "28.65", "23.65", "22.80", "21.65", "20.65",
"19.65", "18.65", "17.70", "16.70", "15.65", "14.70", "13.70",
"12.65", "11.75", "10.75", " 9.80", " 8.80", " 7.85", " 6.95",
" 6.00", " 5.10", " 0", " 0", " 21", " 27", " 0",
" 356", " 0", " 82", " 0", " 0", " 323", " 0",
" 444", " 242", " 223", " 1304", " 362", " 263", " 126",
" 690", " 1445", " 624", " 476", " 995", " 730", NA,
NA, NA, NA, " 71", " 131", NA, NA, NA, NA, " 435", NA, " 42",
NA, " 171", " 423", " 83", " 39", " 20", " 6", " 124",
" 42", " 177", " 425", " 344", " 65.00", " 70.00", " 75.00",
" 80.00", " 85.00", " 90.00", " 91.00", " 92.00", " 93.00", " 94.00",
" 95.00", " 96.00", " 97.00", " 98.00", " 99.00", "100.00", "101.00",
"102.00", "103.00", "104.00", "105.00", "106.00", "107.00", "108.00",
"109.00", NA, NA, " 0.00", " 0.00", " 0.97", " -1.21", NA,
" 0.00", NA, NA, " -6.42", NA, " -6.11", " 0.00", " -1.97",
" -9.97", "-10.56", "-18.59", "-12.24", "-12.95", "-11.27", " -5.36",
"-10.07", "-18.65", " -8.18", "C", "C", "C", "C", "C", "C", "C",
"C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C",
"C", "C", "C", "C", "C", "Sep 25, 2015", "Sep 25, 2015", "Sep 25, 2015",
"Sep 25, 2015", "Sep 25, 2015", "Sep 25, 2015", "Sep 25, 2015",
"Sep 25, 2015", "Sep 25, 2015", "Sep 25, 2015", "Sep 25, 2015",
"Sep 25, 2015", "Sep 25, 2015", "Sep 25, 2015", "Sep 25, 2015",
"Sep 25, 2015", "Sep 25, 2015", "Sep 25, 2015", "Sep 25, 2015",
"Sep 25, 2015", "Sep 25, 2015", "Sep 25, 2015", "Sep 25, 2015",
"Sep 25, 2015", "Sep 25, 2015"), class = c("xts", "zoo"), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index = structure(c(1442534400,
1442534400, 1442534400, 1442534400, 1442534400, 1442534400, 1442534400,
1442534400, 1442534400, 1442534400, 1442534400, 1442534400, 1442534400,
1442534400, 1442534400, 1442534400, 1442534400, 1442534400, 1442534400,
1442534400, 1442534400, 1442534400, 1442534400, 1442534400, 1442534400
), tzone = "UTC", tclass = "Date"), .Dim = c(25L, 10L), .Dimnames = list(
NULL, c("p", "c", "b", "a", "oi", "vol", "strike", "cp",
"callput", "expiry")))
What I have done is turned columns 1-8 into numeric by using the following:
xx <- reclass(apply(x[,1:8], 2, as.numeric), x)
but when I try to combine it with the last two character columns in x
called expiry
and callput
it turns the character columns into NA
xy <- merge.xts(xx, x[,9:10])
how can i work around this?
Upvotes: 2
Views: 626
Reputation: 28938
xts
is a matrix
beneath the surface, so it must be all numeric, or all character. With financial applications it normally has to be numeric, so the question becomes what to do with the character columns.
If a character data column can only be one of a few possible values then you actually have a factor. Your call/put column fits this:
as.numeric( factor( c("C","C","P"), levels=c("C","P" ) ) ) #1 1 2
Obviously, you need to know in advance all your factor levels.
If a character column is actually a datestamp, such as your option expiry column, then there are two ways to convert it to a number. One is to use as.numeric
directly:
as.numeric(as.Date("Sep 25, 2015", "%b %d, %Y")) #16699
The other is as an 8-digit YYYYMMDD number:
as.numeric(format(as.Date("Sep 25, 2015", "%b %d, %Y"), "%Y%m%d")) #20150921
I prefer the latter, as it is more readable. (But the former if you want to do date arithmetic directly on it.)
Timestamps, and time of day can be handled in the same way.
If you have a character string that does not fit the above cases, the choices are less palatable:
data.frame
. (Datestamps in the rownames
; you can still rbind
new rows in; subset
out the columns of interest into an xts
object when that is what you need)xts
objects (nasty code smell).xts
object that is an xts
object. I've used this when the character strings are only for a subset of the datestamps in the main xts object.)Upvotes: 1