Reputation: 2233
I'm working on a data set of IBM by using quantmod
. I created two variables and then I used the glm
function to see the relation between the two of them. The code ran good but then I noticed that part of the data frame contains NA
s. How can I overcome this issue?
Here is my code:
library("quantmod")
getSymbols("IBM")
dim(IBM)
IBM$CurrtDay_up <- ifelse(IBM$IBM.Open < IBM$IBM.Close,1,0)
IBM$LastDay_green <- ifelse((lag(IBM$IBM.Open,k=1) < lag(IBM$IBM.Close,k=1)),1,0)
head(IBM)
IBM.Open IBM.High IBM.Low IBM.Close IBM.Volume IBM.Adjusted CurrtDay_up LastDay_green
2007-01-03 97.18 98.40 96.26 97.27 9196800 82.78498 1 NA
2007-01-04 97.25 98.79 96.88 98.31 10524500 83.67011 1 1
2007-01-05 97.60 97.95 96.91 97.42 7221300 82.91264 0 1
2007-01-08 98.50 99.50 98.35 98.90 10340000 84.17225 1 0
2007-01-09 99.08 100.33 99.07 100.07 11108200 85.16802 1 1
2007-01-10 98.50 99.05 97.93 98.89 8744800 84.16374 1 1
then I added the glm
function:
IBM_1 <- IBM[3:1000,] # to avoid the first row's NA.
glm_greenDay <- glm(CurrtDay_up~LastDay_green,data=IBM_1,family=binomial(link='logit'))
IBM_1$glm_pred<-predict(glm_greenDay,newdata=IBM_1,type='response')
head(IBM_1)
IBM.Open IBM.High IBM.Low IBM.Close IBM.Volume IBM.Adjusted CurrtDay_up LastDay_green glm_pred
2007-01-04 NA NA NA NA NA NA NA NA 0.5683453
2007-01-05 97.60 97.95 96.91 97.42 7221300 82.91264 0 1 NA
2007-01-07 NA NA NA NA NA NA NA NA 0.5407240
2007-01-08 98.50 99.50 98.35 98.90 10340000 84.17225 1 0 NA
2007-01-08 NA NA NA NA NA NA NA NA 0.5683453
2007-01-09 99.08 100.33 99.07 100.07 11108200 85.16802 1 1 NA
UPDATED CODE (please notice that one row (row # 2) has been duplicated: :
IBM_1<-IBM[complete.cases(IBM[1:1000,]),] # to evoid the first row's NA.
glm_greenDay<-glm(CurrtDay_up~LastDay_green,data=IBM_1,family=binomial(link='logit'))
IBM_1$glm_pred<-glm_greenDay$fitted.values
head(IBM_1)
IBM.Open IBM.High IBM.Low IBM.Close IBM.Volume IBM.Adjusted CurrtDay_up LastDay_green glm_pred
2007-01-03 NA NA NA NA NA NA NA NA 0.5691203
2007-01-04 97.25 98.79 96.88 98.31 10524500 83.67011 1 1 NA
2007-01-04 NA NA NA NA NA NA NA NA 0.5691203
2007-01-05 97.60 97.95 96.91 97.42 7221300 82.91264 0 1 NA
2007-01-07 NA NA NA NA NA NA NA NA 0.5407240
2007-01-08 98.50 99.50 98.35 98.90 10340000 84.17225 1 0 NA
Upvotes: 1
Views: 97
Reputation: 5335
The problem is arising because the output of predict()
is not an xts
class object. The slots in the vector of predicted values have dates for names, but the vector is still just a vector without time indexing. I was able to get a simple call to merge()
to work without dropping NAs before modeling by converting the output of predict()
to class xts
first:
library(quantmod)
getSymbols("IBM")
IBM$CurrtDay_up <- ifelse(IBM$IBM.Open < IBM$IBM.Close, 1, 0)
IBM$LastDay_green <- ifelse((lag(IBM$IBM.Open, k=1) < lag(IBM$IBM.Close, k=1)), 1, 0)
glm_greenDay <- glm(CurrtDay_up~LastDay_green, data=IBM, family=binomial(link='logit'), na.action=na.exclude)
glm_pred <- predict(glm_greenDay, type='response')
glm_pred_xts <- xts(x = glm_pred, order.by = as.Date(names(glm_pred)))
IBM2 <- merge(IBM, glm_pred_xts)
That seems to produce the desired output:
> head(glm_pred)
2007-01-03 2007-01-04 2007-01-05 2007-01-08 2007-01-09 2007-01-10
NA 0.5383952 0.5383952 0.5383065 0.5383952 0.5383952
> head(IBM2)
IBM.Open IBM.High IBM.Low IBM.Close IBM.Volume IBM.Adjusted CurrtDay_up LastDay_green glm_pred_xts
2007-01-03 97.18 98.40 96.26 97.27 9196800 82.78498 1 NA NA
2007-01-04 97.25 98.79 96.88 98.31 10524500 83.67011 1 1 0.5383952
2007-01-05 97.60 97.95 96.91 97.42 7221300 82.91264 0 1 0.5383952
2007-01-08 98.50 99.50 98.35 98.90 10340000 84.17225 1 0 0.5383065
2007-01-09 99.08 100.33 99.07 100.07 11108200 85.16802 1 1 0.5383952
2007-01-10 98.50 99.05 97.93 98.89 8744800 84.16374 1 1 0.5383952
Upvotes: 1
Reputation: 11
Might be how you're constructing your final data frame and how R handles NAs.
The way I read your code you're adding the result column to the data frame with:
IBM_1$glm_pred<-glm_greenDay$fitted.values
You might be able to throw your result into a separate object and use cbind
to attach it to the rest of your data frame without propagating the NAs across columns
Maybe...
glm_pred<-matrix(glm_greenDay$fitted.values,ncol=1)
IBM_glm<-cbind(IBM_1,glm_pred)
Don't know if it's the most elegant but might be a start.
Upvotes: 1