Reputation: 157
I'm trying to do a logistic regression using glm
in R. My data sheet measurement.csv
is structured like the following:
subject,intensity,infarcted
MR101325,1.05767712056061,1
MR1017924,0.942893526332193,1
MR1034135,1.04579279903598,1
MR1048784,0.782340895322641,1
MR1085298,0.806187306821611,0
MR1132243,0.856600956071013,0
MR1140359,0.709137967989653,0
MR1142453,0.601887753777769,0
I use the following method to input my data into R and plot it
data1 <- read.csv ("measurement.csv", header=TRUE, stringsAsFactors=FALSE)
plot(x=data1$intensity,y=data1$infarcted)`
Now I want to fit a logistic regression using glm
and add the line to the plot.
glm.stroke=glm(data1$infarcted ~ data1$intensity, data = data1, family = binomial)
lines(data1$infarcted,glm.stroke$fitted.values)
The last lines leads to the error Error in xy.coords(x, y) : 'x' and 'y' lengths differ
. I suspect the problem lies within the way I calculate glm.stroke$fitted.values
, but I can't seem to figure out the exact problem.
Upvotes: 0
Views: 158
Reputation: 226192
If you have missing (NA) values in your data set, you might want to use na.action=na.exclude
in your fit, then use fitted(glm.stroke)
instead of glm.stroke$fitted.values
. (Also, don't include data1$
in your formula: use infarcted ~ intensity
).
This plot might make more sense:
plot(infarcted ~ intensity, data = data1)
pframe <- data.frame(intensity=seq(min(intensity, na.rm=TRUE),
max(intensity, na.rm=TRUE),
length=51))
pframe$infprob <- predict(glm.stroke, newdata=pframe, type="response")
with(pframe, lines(intensity,infprob)
Upvotes: 1